<!DOCTYPE html>
<html lang="" xml:lang="">
<head>

  <meta charset="utf-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <title>第 1 章 数据结构与数据集 | R语言数据分析组队学习</title>
  <meta name="description" content="第 1 章 数据结构与数据集 | R语言数据分析组队学习" />
  <meta name="generator" content="bookdown 0.22 and GitBook 2.6.7" />

  <meta property="og:title" content="第 1 章 数据结构与数据集 | R语言数据分析组队学习" />
  <meta property="og:type" content="book" />
  
  
  
  

  <meta name="twitter:card" content="summary" />
  <meta name="twitter:title" content="第 1 章 数据结构与数据集 | R语言数据分析组队学习" />
  
  
  

<meta name="author" content="张晋、杨佳达、牧小熊、杨杨卓然、姚昱君" />



  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <meta name="apple-mobile-web-app-capable" content="yes" />
  <meta name="apple-mobile-web-app-status-bar-style" content="black" />
  
  
<link rel="prev" href="task-00.html"/>
<link rel="next" href="task-02.html"/>
<script src="libs/header-attrs-2.9/header-attrs.js"></script>
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-clipboard.css" rel="stylesheet" />









<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>


<style type="text/css">
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
  { counter-reset: source-line 0; }
pre.numberSource code > span
  { position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
  { content: counter(source-line);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {   }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>


</head>

<body>



  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">

    <div class="book-summary">
      <nav role="navigation">

<ul class="summary">
<li><a href="./">R语言数据分析组队学习</a></li>

<li class="divider"></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>欢迎！</a>
<ul>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html#贡献者信息"><i class="fa fa-check"></i>贡献者信息</a></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html#课程简介"><i class="fa fa-check"></i>课程简介</a></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html#课程大纲"><i class="fa fa-check"></i>课程大纲</a></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html#关于-datawhale"><i class="fa fa-check"></i>关于 Datawhale</a></li>
</ul></li>
<li class="part"><span><b>I 准备工作</b></span></li>
<li class="chapter" data-level="" data-path="task-00.html"><a href="task-00.html"><i class="fa fa-check"></i>熟悉规则与R语言入门</a>
<ul>
<li class="chapter" data-level="0.1" data-path="task-00.html"><a href="task-00.html#安装"><i class="fa fa-check"></i><b>0.1</b> 安装</a>
<ul>
<li class="chapter" data-level="0.1.1" data-path="task-00.html"><a href="task-00.html#r"><i class="fa fa-check"></i><b>0.1.1</b> R</a></li>
<li class="chapter" data-level="0.1.2" data-path="task-00.html"><a href="task-00.html#rstudio"><i class="fa fa-check"></i><b>0.1.2</b> RStudio</a></li>
<li class="chapter" data-level="0.1.3" data-path="task-00.html"><a href="task-00.html#r语言程辑包r-package"><i class="fa fa-check"></i><b>0.1.3</b> R语言程辑包（R Package）</a></li>
</ul></li>
<li class="chapter" data-level="0.2" data-path="task-00.html"><a href="task-00.html#环境配置"><i class="fa fa-check"></i><b>0.2</b> 环境配置</a>
<ul>
<li class="chapter" data-level="0.2.1" data-path="task-00.html"><a href="task-00.html#项目project"><i class="fa fa-check"></i><b>0.2.1</b> 项目（Project）</a></li>
<li class="chapter" data-level="0.2.2" data-path="task-00.html"><a href="task-00.html#用户界面"><i class="fa fa-check"></i><b>0.2.2</b> 用户界面</a></li>
<li class="chapter" data-level="0.2.3" data-path="task-00.html"><a href="task-00.html#r-markdown"><i class="fa fa-check"></i><b>0.2.3</b> R Markdown</a></li>
<li class="chapter" data-level="0.2.4" data-path="task-00.html"><a href="task-00.html#帮助"><i class="fa fa-check"></i><b>0.2.4</b> 帮助</a></li>
</ul></li>
<li class="chapter" data-level="0.3" data-path="task-00.html"><a href="task-00.html#happy-coding"><i class="fa fa-check"></i><b>0.3</b> Happy Coding!</a></li>
<li class="chapter" data-level="" data-path="task-00.html"><a href="task-00.html#本章作者"><i class="fa fa-check"></i>本章作者</a></li>
<li class="chapter" data-level="" data-path="task-00.html"><a href="task-00.html#关于datawhale"><i class="fa fa-check"></i>关于Datawhale</a></li>
</ul></li>
<li class="part"><span><b>II 开始干活</b></span></li>
<li class="chapter" data-level="1" data-path="task-01.html"><a href="task-01.html"><i class="fa fa-check"></i><b>1</b> 数据结构与数据集</a>
<ul>
<li class="chapter" data-level="1.1" data-path="task-01.html"><a href="task-01.html#准备工作"><i class="fa fa-check"></i><b>1.1</b> 准备工作</a></li>
<li class="chapter" data-level="1.2" data-path="task-01.html"><a href="task-01.html#编码基础"><i class="fa fa-check"></i><b>1.2</b> 编码基础</a>
<ul>
<li class="chapter" data-level="1.2.1" data-path="task-01.html"><a href="task-01.html#算术"><i class="fa fa-check"></i><b>1.2.1</b> 算术</a></li>
<li class="chapter" data-level="1.2.2" data-path="task-01.html"><a href="task-01.html#赋值"><i class="fa fa-check"></i><b>1.2.2</b> 赋值</a></li>
<li class="chapter" data-level="1.2.3" data-path="task-01.html"><a href="task-01.html#函数"><i class="fa fa-check"></i><b>1.2.3</b> 函数</a></li>
<li class="chapter" data-level="1.2.4" data-path="task-01.html"><a href="task-01.html#循环loop"><i class="fa fa-check"></i><b>1.2.4</b> 循环（loop）</a></li>
<li class="chapter" data-level="1.2.5" data-path="task-01.html"><a href="task-01.html#管道pipe"><i class="fa fa-check"></i><b>1.2.5</b> 管道（pipe）</a></li>
</ul></li>
<li class="chapter" data-level="1.3" data-path="task-01.html"><a href="task-01.html#数据类型"><i class="fa fa-check"></i><b>1.3</b> 数据类型</a>
<ul>
<li class="chapter" data-level="1.3.1" data-path="task-01.html"><a href="task-01.html#基础数据类型"><i class="fa fa-check"></i><b>1.3.1</b> 基础数据类型</a></li>
<li class="chapter" data-level="1.3.2" data-path="task-01.html"><a href="task-01.html#向量vector"><i class="fa fa-check"></i><b>1.3.2</b> 向量（vector）</a></li>
<li class="chapter" data-level="1.3.3" data-path="task-01.html"><a href="task-01.html#特殊数据类型"><i class="fa fa-check"></i><b>1.3.3</b> 特殊数据类型</a></li>
</ul></li>
<li class="chapter" data-level="1.4" data-path="task-01.html"><a href="task-01.html#多维数据类型"><i class="fa fa-check"></i><b>1.4</b> 多维数据类型</a>
<ul>
<li class="chapter" data-level="1.4.1" data-path="task-01.html"><a href="task-01.html#矩阵matrix"><i class="fa fa-check"></i><b>1.4.1</b> 矩阵（matrix）</a></li>
<li class="chapter" data-level="1.4.2" data-path="task-01.html"><a href="task-01.html#列表list"><i class="fa fa-check"></i><b>1.4.2</b> 列表（list）</a></li>
<li class="chapter" data-level="1.4.3" data-path="task-01.html"><a href="task-01.html#数据表data-frame-与-tibble"><i class="fa fa-check"></i><b>1.4.3</b> 数据表（data frame 与 tibble）</a></li>
</ul></li>
<li class="chapter" data-level="1.5" data-path="task-01.html"><a href="task-01.html#读写数据"><i class="fa fa-check"></i><b>1.5</b> 读写数据</a>
<ul>
<li class="chapter" data-level="1.5.1" data-path="task-01.html"><a href="task-01.html#内置数据集"><i class="fa fa-check"></i><b>1.5.1</b> 内置数据集</a></li>
<li class="chapter" data-level="1.5.2" data-path="task-01.html"><a href="task-01.html#表格类型数据csv-excel"><i class="fa fa-check"></i><b>1.5.2</b> 表格类型数据（csv, excel)</a></li>
<li class="chapter" data-level="1.5.3" data-path="task-01.html"><a href="task-01.html#r的专属类型数据rdata-rds"><i class="fa fa-check"></i><b>1.5.3</b> R的专属类型数据（RData, rds）</a></li>
<li class="chapter" data-level="1.5.4" data-path="task-01.html"><a href="task-01.html#其他软件spss-stata-sas"><i class="fa fa-check"></i><b>1.5.4</b> 其他软件（SPSS, Stata, SAS）</a></li>
</ul></li>
<li class="chapter" data-level="1.6" data-path="task-01.html"><a href="task-01.html#练习题"><i class="fa fa-check"></i><b>1.6</b> 练习题</a>
<ul>
<li class="chapter" data-level="1.6.1" data-path="task-01.html"><a href="task-01.html#了解数据集"><i class="fa fa-check"></i><b>1.6.1</b> 了解数据集</a></li>
<li class="chapter" data-level="1.6.2" data-path="task-01.html"><a href="task-01.html#创造数据集"><i class="fa fa-check"></i><b>1.6.2</b> 创造数据集</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="task-01.html"><a href="task-01.html#本章作者-1"><i class="fa fa-check"></i>本章作者</a></li>
<li class="chapter" data-level="" data-path="task-01.html"><a href="task-01.html#关于datawhale-1"><i class="fa fa-check"></i>关于Datawhale</a></li>
</ul></li>
<li class="chapter" data-level="2" data-path="task-02.html"><a href="task-02.html"><i class="fa fa-check"></i><b>2</b> 数据清洗与准备</a>
<ul>
<li class="chapter" data-level="" data-path="task-02.html"><a href="task-02.html#环境配置-1"><i class="fa fa-check"></i>环境配置</a></li>
<li class="chapter" data-level="" data-path="task-02.html"><a href="task-02.html#案例数据"><i class="fa fa-check"></i>案例数据</a>
<ul>
<li class="chapter" data-level="" data-path="task-02.html"><a href="task-02.html#数据集1-h1n1流感问卷数据集"><i class="fa fa-check"></i>数据集1 h1n1流感问卷数据集</a></li>
<li class="chapter" data-level="" data-path="task-02.html"><a href="task-02.html#数据集2-波士顿房价数据集"><i class="fa fa-check"></i>数据集2 波士顿房价数据集</a></li>
</ul></li>
<li class="chapter" data-level="2.1" data-path="task-02.html"><a href="task-02.html#重复值处理"><i class="fa fa-check"></i><b>2.1</b> 重复值处理</a></li>
<li class="chapter" data-level="2.2" data-path="task-02.html"><a href="task-02.html#缺失值识别与处理"><i class="fa fa-check"></i><b>2.2</b> 缺失值识别与处理</a>
<ul>
<li class="chapter" data-level="2.2.1" data-path="task-02.html"><a href="task-02.html#缺失值识别"><i class="fa fa-check"></i><b>2.2.1</b> 缺失值识别</a></li>
<li class="chapter" data-level="2.2.2" data-path="task-02.html"><a href="task-02.html#缺失值处理"><i class="fa fa-check"></i><b>2.2.2</b> 缺失值处理</a></li>
</ul></li>
<li class="chapter" data-level="2.3" data-path="task-02.html"><a href="task-02.html#异常值识别与处理"><i class="fa fa-check"></i><b>2.3</b> 异常值识别与处理</a>
<ul>
<li class="chapter" data-level="2.3.1" data-path="task-02.html"><a href="task-02.html#异常值识别"><i class="fa fa-check"></i><b>2.3.1</b> 异常值识别</a></li>
<li class="chapter" data-level="2.3.2" data-path="task-02.html"><a href="task-02.html#可视化图形分布"><i class="fa fa-check"></i><b>2.3.2</b> 可视化图形分布</a></li>
<li class="chapter" data-level="2.3.3" data-path="task-02.html"><a href="task-02.html#z-score"><i class="fa fa-check"></i><b>2.3.3</b> z-score</a></li>
<li class="chapter" data-level="2.3.4" data-path="task-02.html"><a href="task-02.html#局部异常因子法"><i class="fa fa-check"></i><b>2.3.4</b> 局部异常因子法</a></li>
<li class="chapter" data-level="2.3.5" data-path="task-02.html"><a href="task-02.html#异常值处理"><i class="fa fa-check"></i><b>2.3.5</b> 异常值处理</a></li>
</ul></li>
<li class="chapter" data-level="2.4" data-path="task-02.html"><a href="task-02.html#特征编码"><i class="fa fa-check"></i><b>2.4</b> 特征编码</a>
<ul>
<li class="chapter" data-level="2.4.1" data-path="task-02.html"><a href="task-02.html#独热编码哑编码"><i class="fa fa-check"></i><b>2.4.1</b> 独热编码/哑编码</a></li>
<li class="chapter" data-level="2.4.2" data-path="task-02.html"><a href="task-02.html#标签编码"><i class="fa fa-check"></i><b>2.4.2</b> 标签编码</a></li>
<li class="chapter" data-level="2.4.3" data-path="task-02.html"><a href="task-02.html#手动编码"><i class="fa fa-check"></i><b>2.4.3</b> 手动编码</a></li>
<li class="chapter" data-level="2.4.4" data-path="task-02.html"><a href="task-02.html#日期特征转换"><i class="fa fa-check"></i><b>2.4.4</b> 日期特征转换</a></li>
</ul></li>
<li class="chapter" data-level="2.5" data-path="task-02.html"><a href="task-02.html#规范化与偏态数据"><i class="fa fa-check"></i><b>2.5</b> 规范化与偏态数据</a>
<ul>
<li class="chapter" data-level="2.5.1" data-path="task-02.html"><a href="task-02.html#规范化"><i class="fa fa-check"></i><b>2.5.1</b> 0-1规范化</a></li>
<li class="chapter" data-level="2.5.2" data-path="task-02.html"><a href="task-02.html#z-score标准化"><i class="fa fa-check"></i><b>2.5.2</b> Z-score标准化</a></li>
<li class="chapter" data-level="2.5.3" data-path="task-02.html"><a href="task-02.html#对数转换log-transform"><i class="fa fa-check"></i><b>2.5.3</b> 对数转换(log transform)</a></li>
<li class="chapter" data-level="2.5.4" data-path="task-02.html"><a href="task-02.html#box-cox"><i class="fa fa-check"></i><b>2.5.4</b> Box-Cox</a></li>
</ul></li>
<li class="chapter" data-level="2.6" data-path="task-02.html"><a href="task-02.html#小拓展"><i class="fa fa-check"></i><b>2.6</b> 小拓展</a></li>
<li class="chapter" data-level="2.7" data-path="task-02.html"><a href="task-02.html#思考与练习"><i class="fa fa-check"></i><b>2.7</b> 思考与练习</a></li>
<li class="chapter" data-level="" data-path="task-02.html"><a href="task-02.html#附录参考资料"><i class="fa fa-check"></i>附录：参考资料</a>
<ul>
<li class="chapter" data-level="" data-path="task-02.html"><a href="task-02.html#理论资料"><i class="fa fa-check"></i>理论资料</a></li>
<li class="chapter" data-level="" data-path="task-02.html"><a href="task-02.html#r语言函数用法示例"><i class="fa fa-check"></i>R语言函数用法示例</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="task-02.html"><a href="task-02.html#本章作者-2"><i class="fa fa-check"></i>本章作者</a></li>
<li class="chapter" data-level="" data-path="task-02.html"><a href="task-02.html#关于datawhale-2"><i class="fa fa-check"></i>关于Datawhale</a></li>
</ul></li>
<li class="chapter" data-level="3" data-path="task-03.html"><a href="task-03.html"><i class="fa fa-check"></i><b>3</b> 基本统计分析</a>
<ul>
<li class="chapter" data-level="" data-path="task-03.html"><a href="task-03.html#准备工作-1"><i class="fa fa-check"></i>准备工作</a></li>
<li class="chapter" data-level="3.1" data-path="task-03.html"><a href="task-03.html#多种方法获取描述性统计量"><i class="fa fa-check"></i><b>3.1</b> 多种方法获取描述性统计量</a>
<ul>
<li class="chapter" data-level="3.1.1" data-path="task-03.html"><a href="task-03.html#基础方法"><i class="fa fa-check"></i><b>3.1.1</b> 基础方法</a></li>
<li class="chapter" data-level="3.1.2" data-path="task-03.html"><a href="task-03.html#拓展包方法"><i class="fa fa-check"></i><b>3.1.2</b> 拓展包方法</a></li>
</ul></li>
<li class="chapter" data-level="3.2" data-path="task-03.html"><a href="task-03.html#分组计算描述性统计"><i class="fa fa-check"></i><b>3.2</b> 分组计算描述性统计</a>
<ul>
<li class="chapter" data-level="3.2.1" data-path="task-03.html"><a href="task-03.html#基础方法-1"><i class="fa fa-check"></i><b>3.2.1</b> 基础方法</a></li>
</ul></li>
<li class="chapter" data-level="3.3" data-path="task-03.html"><a href="task-03.html#频数表和列联表"><i class="fa fa-check"></i><b>3.3</b> 频数表和列联表</a></li>
<li class="chapter" data-level="3.4" data-path="task-03.html"><a href="task-03.html#相关"><i class="fa fa-check"></i><b>3.4</b> 相关</a>
<ul>
<li class="chapter" data-level="3.4.1" data-path="task-03.html"><a href="task-03.html#相关的类型"><i class="fa fa-check"></i><b>3.4.1</b> 相关的类型</a></li>
<li class="chapter" data-level="3.4.2" data-path="task-03.html"><a href="task-03.html#相关性的显著性检验"><i class="fa fa-check"></i><b>3.4.2</b> 相关性的显著性检验</a></li>
</ul></li>
<li class="chapter" data-level="3.5" data-path="task-03.html"><a href="task-03.html#方差分析"><i class="fa fa-check"></i><b>3.5</b> 方差分析</a>
<ul>
<li class="chapter" data-level="3.5.1" data-path="task-03.html"><a href="task-03.html#单因素方差分析"><i class="fa fa-check"></i><b>3.5.1</b> 单因素方差分析</a></li>
<li class="chapter" data-level="3.5.2" data-path="task-03.html"><a href="task-03.html#多因素方差分析"><i class="fa fa-check"></i><b>3.5.2</b> 多因素方差分析</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="task-03.html"><a href="task-03.html#本章作者-3"><i class="fa fa-check"></i>本章作者</a></li>
<li class="chapter" data-level="" data-path="task-03.html"><a href="task-03.html#关于datawhale-3"><i class="fa fa-check"></i>关于Datawhale</a></li>
</ul></li>
<li class="chapter" data-level="4" data-path="task-04.html"><a href="task-04.html"><i class="fa fa-check"></i><b>4</b> 数据可视化</a>
<ul>
<li class="chapter" data-level="" data-path="task-04.html"><a href="task-04.html#ggplot2包介绍"><i class="fa fa-check"></i>ggplot2包介绍</a></li>
<li class="chapter" data-level="4.1" data-path="task-04.html"><a href="task-04.html#环境配置-2"><i class="fa fa-check"></i><b>4.1</b> 环境配置</a>
<ul>
<li class="chapter" data-level="" data-path="task-04.html"><a href="task-04.html#案例数据-1"><i class="fa fa-check"></i>案例数据</a></li>
</ul></li>
<li class="chapter" data-level="4.2" data-path="task-04.html"><a href="task-04.html#散点图"><i class="fa fa-check"></i><b>4.2</b> 散点图</a></li>
<li class="chapter" data-level="4.3" data-path="task-04.html"><a href="task-04.html#直方图"><i class="fa fa-check"></i><b>4.3</b> 直方图</a></li>
<li class="chapter" data-level="4.4" data-path="task-04.html"><a href="task-04.html#柱状图"><i class="fa fa-check"></i><b>4.4</b> 柱状图</a></li>
<li class="chapter" data-level="4.5" data-path="task-04.html"><a href="task-04.html#饼状图"><i class="fa fa-check"></i><b>4.5</b> 饼状图</a></li>
<li class="chapter" data-level="4.6" data-path="task-04.html"><a href="task-04.html#折线图"><i class="fa fa-check"></i><b>4.6</b> 折线图</a></li>
<li class="chapter" data-level="4.7" data-path="task-04.html"><a href="task-04.html#ggplot2扩展包主题"><i class="fa fa-check"></i><b>4.7</b> ggplot2扩展包主题</a></li>
<li class="chapter" data-level="" data-path="task-04.html"><a href="task-04.html#本章作者-4"><i class="fa fa-check"></i>本章作者</a></li>
<li class="chapter" data-level="" data-path="task-04.html"><a href="task-04.html#关于datawhale-4"><i class="fa fa-check"></i>关于Datawhale</a></li>
</ul></li>
<li class="chapter" data-level="5" data-path="task-05.html"><a href="task-05.html"><i class="fa fa-check"></i><b>5</b> 模型</a>
<ul>
<li class="chapter" data-level="5.1" data-path="task-05.html"><a href="task-05.html#前言"><i class="fa fa-check"></i><b>5.1</b> 前言</a>
<ul>
<li class="chapter" data-level="5.1.1" data-path="task-05.html"><a href="task-05.html#linear-regression"><i class="fa fa-check"></i><b>5.1.1</b> Linear Regression</a></li>
<li class="chapter" data-level="5.1.2" data-path="task-05.html"><a href="task-05.html#stepwise-regression"><i class="fa fa-check"></i><b>5.1.2</b> Stepwise Regression</a></li>
</ul></li>
<li class="chapter" data-level="5.2" data-path="task-05.html"><a href="task-05.html#分类模型"><i class="fa fa-check"></i><b>5.2</b> 分类模型</a>
<ul>
<li class="chapter" data-level="5.2.1" data-path="task-05.html"><a href="task-05.html#logistics-regression"><i class="fa fa-check"></i><b>5.2.1</b> Logistics Regression</a></li>
<li class="chapter" data-level="5.2.2" data-path="task-05.html"><a href="task-05.html#knn"><i class="fa fa-check"></i><b>5.2.2</b> KNN</a></li>
<li class="chapter" data-level="5.2.3" data-path="task-05.html"><a href="task-05.html#decision-tree"><i class="fa fa-check"></i><b>5.2.3</b> Decision Tree</a></li>
<li class="chapter" data-level="5.2.4" data-path="task-05.html"><a href="task-05.html#random-forest"><i class="fa fa-check"></i><b>5.2.4</b> Random Forest</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="task-05.html"><a href="task-05.html#思考与练习-1"><i class="fa fa-check"></i>思考与练习</a></li>
<li class="chapter" data-level="" data-path="task-05.html"><a href="task-05.html#本章作者-5"><i class="fa fa-check"></i>本章作者</a></li>
<li class="chapter" data-level="" data-path="task-05.html"><a href="task-05.html#关于datawhale-5"><i class="fa fa-check"></i>关于Datawhale</a></li>
</ul></li>
</ul>

      </nav>
    </div>

    <div class="book-body">
      <div class="body-inner">
        <div class="book-header" role="navigation">
          <h1>
            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">R语言数据分析组队学习</a>
          </h1>
        </div>

        <div class="page-wrapper" tabindex="-1" role="main">
          <div class="page-inner">

            <section class="normal" id="section-">
<div id="task-01" class="section level1" number="1">
<h1><span class="header-section-number">第 1 章</span> 数据结构与数据集</h1>
<p><img src="image/task01_data_structure.jpg" style="width:100.0%" /></p>
<div id="准备工作" class="section level2" number="1.1">
<h2><span class="header-section-number">1.1</span> 准备工作</h2>
<p>这节组队学习的目的主要是帮助你上手R的基本编程逻辑，了解一些R编程的基本概念，包括各个数据类型和数据集的读取与保存。</p>
<p>在开始我们的学习之前，不要忘记在 R Studio 中切换到组队学习专属的项目，打开一个 R script 文件或者 R Markdown 文件（详见入门篇）。</p>
</div>
<div id="编码基础" class="section level2" number="1.2">
<h2><span class="header-section-number">1.2</span> 编码基础</h2>
<p>首先我们来了解一些基本的编码操作。在 R Studio 中想要运行代码可以在控制台 Console 中键入代码后点击回车。这样运行的代码会被保存在当前项目的 <code>.Rhistory</code> 文件中，也可以在 R Studio 界面右上角的 History 面板中找到，但是不会被明确地保存下来作为一个脚本文件。一般只有在我们想要运行一些简单的指令或者计算的时候才会采取这种方式。更常见的是将代码写在脚本文件中，选中相应的代码后点击界面上方的Run或者快捷键（<code>Ctrl</code> + <code>Enter</code>）来运行。</p>
<div id="算术" class="section level3" number="1.2.1">
<h3><span class="header-section-number">1.2.1</span> 算术</h3>
<p>你可以直接运行计算命令。计算符号包括加<code>+</code>、减<code>-</code>、乘<code>*</code>、除<code>/</code>、求幂<code>^</code>以及求余数<code>%%</code>等。值得一提的是开平方根有他自己单独的函数<code>sqrt</code>。</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="task-01.html#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="dv">1</span> <span class="sc">+</span> <span class="dv">1</span></span></code></pre></div>
<pre><code>## [1] 2</code></pre>
<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="task-01.html#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="dv">1</span> <span class="sc">-</span> <span class="dv">1</span></span></code></pre></div>
<pre><code>## [1] 0</code></pre>
<div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="task-01.html#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="dv">1</span> <span class="sc">*</span> <span class="dv">2</span></span></code></pre></div>
<pre><code>## [1] 2</code></pre>
<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="task-01.html#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="dv">1</span> <span class="sc">/</span> <span class="dv">2</span></span></code></pre></div>
<pre><code>## [1] 0.5</code></pre>
<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb15-1"><a href="task-01.html#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="dv">3</span> <span class="sc">%%</span> <span class="dv">2</span></span></code></pre></div>
<pre><code>## [1] 1</code></pre>
<div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb17-1"><a href="task-01.html#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="dv">2</span><span class="sc">^</span>(<span class="dv">1</span> <span class="sc">/</span> <span class="dv">2</span>)</span></code></pre></div>
<pre><code>## [1] 1.414214</code></pre>
<div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb19-1"><a href="task-01.html#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="fu">sqrt</span>(<span class="dv">2</span>)</span></code></pre></div>
<pre><code>## [1] 1.414214</code></pre>
</div>
<div id="赋值" class="section level3" number="1.2.2">
<h3><span class="header-section-number">1.2.2</span> 赋值</h3>
<p>在 R 里，我们可以为一个“东西”取一个名字，这个“东西”可以是一个值、一个向量、或者一个函数等，这样我们就可以之后再获取存储在这个名字下面的信息。</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb21-1"><a href="task-01.html#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 将数字42赋予名叫x的变量</span></span>
<span id="cb21-2"><a href="task-01.html#cb21-2" aria-hidden="true" tabindex="-1"></a>x <span class="ot">&lt;-</span> <span class="dv">42</span></span>
<span id="cb21-3"><a href="task-01.html#cb21-3" aria-hidden="true" tabindex="-1"></a><span class="co"># 在R中运行一个物体的名字</span></span>
<span id="cb21-4"><a href="task-01.html#cb21-4" aria-hidden="true" tabindex="-1"></a><span class="co"># R将会打印出（print）该物体的值</span></span>
<span id="cb21-5"><a href="task-01.html#cb21-5" aria-hidden="true" tabindex="-1"></a>x</span></code></pre></div>
<pre><code>## [1] 42</code></pre>
<p>在 R 中基础赋值的符号有三种：</p>
<ol style="list-style-type: decimal">
<li>一个向左的箭头<code>&lt;-</code>表示将箭头右方的值取名叫做箭头左侧的名字，或者将箭头右侧的值存储在箭头左侧的名字里；</li>
<li>一个向右的箭头<code>-&gt;</code>表示将箭头左侧的值储存在箭头右侧的名字里；</li>
<li>一个等号<code>=</code>表示将箭头右侧的值存储在箭头左侧的名字里（同1）。</li>
</ol>
<p>在早期的键盘中有一个单独的按键就是向左的箭头，虽然后来的键盘不再设立这个按键，但是使用箭头的编程习惯流传了下来。赋值符号的选择取决于个人习惯，但是我们大多数情况下都推荐使用箭头（尤其是向左的箭头）作为赋值的符号。这是R语言于其他语言不同的地方，有以下原因：</p>
<ol style="list-style-type: decimal">
<li>箭头明确了赋值方向，这是等号做不到的；</li>
<li>等号用在顶层环境中的时候是赋值，用在函数中则是设参（或者叫做在函数层面赋值）。这种二义性不小心区分则可能会引发错误。而等号即使用在函数中也是赋值；</li>
<li>箭头可以做到多次赋值（<code>a &lt;- b &lt;- 42</code>）甚至是不同方向上多次赋值（<code>a &lt;- 42 -&gt; b</code>）（尽量避免！）；</li>
<li>虽然这次组队学习中不会学到，但是更高级的赋值工具包括 <code>&lt;&lt;-</code> 和 <code>-&gt;&gt;</code> 对应向左或向右的箭头；</li>
<li>同时使用<code>=</code>与<code>==</code>（判断是否相等）会降低可读性（<code>a &lt;- 1 == 2</code> vs <code>a = 1 == 2</code>）。</li>
</ol>
<p>总结，<strong>凡是赋值都用<code>&lt;-</code></strong>，凡是设参（之后会说到）都用<code>=</code>。</p>
<p>在 R Studio 中可以使用快捷键<code>Alt</code> + <code>-</code>来输入<code>&lt;-</code>，这样就不用每次都点两次键啦。这样还有一个好处，就是 R Studio 会自动识别当前的输入语言，从而选择最佳的赋值符号。大多数情况下这也就是说他会输入R的<code>&lt;-</code>，如果你在 R Studio 里用 Python 的话就会自动变成<code>=</code>啦。</p>
<p>原来作用于单纯数字上的算术运算现在即可用变量名称代替具体的数值。</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb23-1"><a href="task-01.html#cb23-1" aria-hidden="true" tabindex="-1"></a>y <span class="ot">&lt;-</span> <span class="dv">21</span></span>
<span id="cb23-2"><a href="task-01.html#cb23-2" aria-hidden="true" tabindex="-1"></a>x <span class="sc">+</span> y</span></code></pre></div>
<pre><code>## [1] 63</code></pre>
<div class="sourceCode" id="cb25"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb25-1"><a href="task-01.html#cb25-1" aria-hidden="true" tabindex="-1"></a>x <span class="ot">&lt;-</span> x <span class="sc">+</span> y</span></code></pre></div>
</div>
<div id="函数" class="section level3" number="1.2.3">
<h3><span class="header-section-number">1.2.3</span> 函数</h3>
<p>R是一个非纯函数式编程（Functional Programming）的语言，与你平时可能所熟悉的面向对象程序设计（Object-Oriented Programming）的编程语言（比如 Python）不一样。这意味着在R中，相对于以类（Class）与对象（Object）的思路思考问题，我们要更多地定义函数（Function）以及考虑函数的输入与输出来进行运算。（如果你不知道我在这里说什么，请忽略这段话。）</p>
<p>在R中，所有的运算都是通过函数来达成的。我们可以用和之前一样的赋值方法（<code>&lt;-</code>）来将一个函数存储在一个名字下。请看以下示例：</p>
<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb26-1"><a href="task-01.html#cb26-1" aria-hidden="true" tabindex="-1"></a>addone <span class="ot">&lt;-</span> <span class="cf">function</span>(<span class="at">x =</span> <span class="dv">0</span>) {</span>
<span id="cb26-2"><a href="task-01.html#cb26-2" aria-hidden="true" tabindex="-1"></a>  x <span class="sc">+</span> <span class="dv">1</span></span>
<span id="cb26-3"><a href="task-01.html#cb26-3" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>这里我创建了一个名为<code>addone</code>的函数，这个函数的作用就是将输入值在函数内部存储在名为<code>x</code>的参数里，在名为<code>x</code>的值上加一，再返回结果。如果没有输入值的话，<code>x</code>的默认值是<code>x = 0</code>。</p>
<div class="sourceCode" id="cb27"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb27-1"><a href="task-01.html#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 作用在数字上</span></span>
<span id="cb27-2"><a href="task-01.html#cb27-2" aria-hidden="true" tabindex="-1"></a><span class="fu">addone</span>(<span class="dv">42</span>)</span></code></pre></div>
<pre><code>## [1] 43</code></pre>
<div class="sourceCode" id="cb29"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb29-1"><a href="task-01.html#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 作用在变量上</span></span>
<span id="cb29-2"><a href="task-01.html#cb29-2" aria-hidden="true" tabindex="-1"></a>y <span class="ot">&lt;-</span> <span class="dv">42</span></span>
<span id="cb29-3"><a href="task-01.html#cb29-3" aria-hidden="true" tabindex="-1"></a><span class="fu">addone</span>(y)</span></code></pre></div>
<pre><code>## [1] 43</code></pre>
<p>如你可见，调用函数的方法就是在函数的名字后边加小括号，再在小括号中输入参数（arguments）。当函数有多个可选参数的时候，建议输入参数的时候使用等号<code>=</code>明确函数名称。这便是之前提到过的等号的设参用法。</p>
<div class="sourceCode" id="cb31"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb31-1"><a href="task-01.html#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="fu">addone</span>(<span class="at">x =</span> <span class="dv">42</span>)</span></code></pre></div>
<pre><code>## [1] 43</code></pre>
<p>如果你没有使用小括号而是直接在控制台中运行函数的名字的话，像以前一样，R会直接打印出这个函数的内容，即源代码：</p>
<div class="sourceCode" id="cb33"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb33-1"><a href="task-01.html#cb33-1" aria-hidden="true" tabindex="-1"></a>addone</span></code></pre></div>
<pre><code>## function(x = 0) {
##   x + 1
## }
## &lt;environment: 0x0000000015e84bc8&gt;</code></pre>
<p>当你完成了一个复杂的计算，不要忘记把结果储存在一个名字下，否则结果不会保存下来，只会在控制台中一闪而过。</p>
<div class="sourceCode" id="cb35"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb35-1"><a href="task-01.html#cb35-1" aria-hidden="true" tabindex="-1"></a>y <span class="ot">&lt;-</span> <span class="dv">42</span></span>
<span id="cb35-2"><a href="task-01.html#cb35-2" aria-hidden="true" tabindex="-1"></a>y_plusone <span class="ot">&lt;-</span> <span class="fu">addone</span>(y)</span></code></pre></div>
</div>
<div id="循环loop" class="section level3" number="1.2.4">
<h3><span class="header-section-number">1.2.4</span> 循环（loop）</h3>
<p>使用代码很重要的一个原因是可以重复进行多次相同或有规律的操作，也就是循环了。</p>
<p>R 中的循环函数包括<code>for</code>，<code>while</code>，和<code>repeat</code>。在这里我们简单用一个例子来介绍一下最灵活的<code>for</code>循环：</p>
<div class="sourceCode" id="cb36"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb36-1"><a href="task-01.html#cb36-1" aria-hidden="true" tabindex="-1"></a>x <span class="ot">&lt;-</span> <span class="dv">0</span></span>
<span id="cb36-2"><a href="task-01.html#cb36-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span>(i <span class="cf">in</span> <span class="dv">1</span><span class="sc">:</span><span class="dv">3</span>){</span>
<span id="cb36-3"><a href="task-01.html#cb36-3" aria-hidden="true" tabindex="-1"></a>  x <span class="ot">&lt;-</span> x <span class="sc">+</span> i</span>
<span id="cb36-4"><a href="task-01.html#cb36-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">print</span>(x)</span>
<span id="cb36-5"><a href="task-01.html#cb36-5" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<pre><code>## [1] 1
## [1] 3
## [1] 6</code></pre>
<p>在最开始的时候，我们让<code>x</code>等于0。在接下来进行的循环操作中，紧跟在<code>for</code>之后的小括号里给出了每个回合当中会变化的参数，叫做<code>i</code>。<code>i</code>后边的<code>in</code>之后给出的是参数<code>i</code>在回合中的可能取值，也就是从1到3的正整数。最后大括号中给出每个回合的操作，在<code>x</code>上加上<code>i</code>的值，重新取名为<code>x</code>，再打印出来。</p>
<p>整个流程下来：</p>
<p>第一个回合<code>x</code>一开始是0，在第一个回合中<code>i</code>是1，经过计算赋值<code>x</code>变成了1，打印后进入第二个回合；<br />
第二个回合<code>x</code>一开始是1，在第二个回合中<code>i</code>是2，经过计算赋值<code>x</code>变成了3，打印后进入第二个回合；<br />
第三个回合<code>x</code>一开始是3，在第三个回合中<code>i</code>是3，经过计算赋值<code>x</code>变成了6，打印后结束循环。</p>
</div>
<div id="管道pipe" class="section level3" number="1.2.5">
<h3><span class="header-section-number">1.2.5</span> 管道（pipe）</h3>
<p>如果我们想要对一个对象进行多个函数的操作，比如说想要使用我们刚刚定义的<code>addone</code>函数，还有新定义的<code>addtwo</code>，<code>addthree</code>，我们可以按照普通调用函数的方法一个套一个：</p>
<div class="sourceCode" id="cb38"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb38-1"><a href="task-01.html#cb38-1" aria-hidden="true" tabindex="-1"></a>addone <span class="ot">&lt;-</span> <span class="cf">function</span>(x) x<span class="sc">+</span><span class="dv">1</span></span>
<span id="cb38-2"><a href="task-01.html#cb38-2" aria-hidden="true" tabindex="-1"></a>addtwo <span class="ot">&lt;-</span> <span class="cf">function</span>(x) x<span class="sc">+</span><span class="dv">2</span></span>
<span id="cb38-3"><a href="task-01.html#cb38-3" aria-hidden="true" tabindex="-1"></a>addthree <span class="ot">&lt;-</span> <span class="cf">function</span>(x) x<span class="sc">+</span><span class="dv">3</span></span>
<span id="cb38-4"><a href="task-01.html#cb38-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb38-5"><a href="task-01.html#cb38-5" aria-hidden="true" tabindex="-1"></a>x <span class="ot">&lt;-</span> <span class="dv">0</span></span>
<span id="cb38-6"><a href="task-01.html#cb38-6" aria-hidden="true" tabindex="-1"></a><span class="fu">addthree</span>(<span class="fu">addtwo</span>(<span class="fu">addone</span>(x)))</span></code></pre></div>
<pre><code>## [1] 6</code></pre>
<p>在这种常规的方法下，函数运行的顺序，和我们读函数的顺序，都是从内到外的。比如在上边的操作中，我们先用了<code>addone</code>给0加1，又用了<code>addtwo</code>，最后用了<code>addthree</code>。这样的坏处也是显而易见的，即可读性很差。想象一下你要对一个数据列表连续使用十几个函数，每个函数里都有其自己不同的参数，这么一系列操作如果用这个常规方法的话必然会使代码变成一个很难读的庞然大物。</p>
<p><code>magrittr</code>包提供了另一种使用函数的办法，即使用<code>%&gt;%</code>这个符号函数进行方法链（method chain）的操作。你可以把这个符号叫做管道。如果我们用管道来重写之前的一连串操作，代码会变成：</p>
<div class="sourceCode" id="cb40"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb40-1"><a href="task-01.html#cb40-1" aria-hidden="true" tabindex="-1"></a><span class="co"># tidyverse也包含了管道符号</span></span>
<span id="cb40-2"><a href="task-01.html#cb40-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyverse)</span>
<span id="cb40-3"><a href="task-01.html#cb40-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb40-4"><a href="task-01.html#cb40-4" aria-hidden="true" tabindex="-1"></a>x <span class="sc">%&gt;%</span> </span>
<span id="cb40-5"><a href="task-01.html#cb40-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">addone</span>() <span class="sc">%&gt;%</span> </span>
<span id="cb40-6"><a href="task-01.html#cb40-6" aria-hidden="true" tabindex="-1"></a>  <span class="fu">addtwo</span>() <span class="sc">%&gt;%</span> </span>
<span id="cb40-7"><a href="task-01.html#cb40-7" aria-hidden="true" tabindex="-1"></a>  <span class="fu">addthree</span>()</span></code></pre></div>
<pre><code>## [1] 6</code></pre>
<p>这个符号的具体含义简单来说是“将上一步运行的结果放在下一步运行的函数的第一个参数的位置上”。在这个例子中，<code>x</code>被当作<code>addone</code>的第一个参数被加一。<code>addone</code>的运行结果被当成下一步<code>addtwo</code>的第一个参数被加二，其运行结果最后被当成<code>addthree</code>的第一个参数被加三，最终得到结果。</p>
<p>经过了管道的改写之后，函数的可读性得到了大幅上升。从常规的“从内到外”读法，变成了“从上到下，从左到右”。虽然需要运行额外R包，但是由于符合阅读习惯和数据清洗流程的特点，管道在数据分析的领域被普遍使用。</p>
<p>关于管道符号的具体使用规则详见<code>?`%&gt;%`</code>。</p>
</div>
</div>
<div id="数据类型" class="section level2" number="1.3">
<h2><span class="header-section-number">1.3</span> 数据类型</h2>
<div id="基础数据类型" class="section level3" number="1.3.1">
<h3><span class="header-section-number">1.3.1</span> 基础数据类型</h3>
<p>在R中有五种基础数据类型，包括三个数值型、一个逻辑型和一个字符型。</p>
<div id="数值型" class="section level4" number="1.3.1.1">
<h4><span class="header-section-number">1.3.1.1</span> 数值型</h4>
<p>数值型数据包括三种，分别是默认的实数数值型数据（double）、整数类型（integer）和复数类型（complex）：</p>
<div class="sourceCode" id="cb42"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb42-1"><a href="task-01.html#cb42-1" aria-hidden="true" tabindex="-1"></a><span class="co"># numeric</span></span>
<span id="cb42-2"><a href="task-01.html#cb42-2" aria-hidden="true" tabindex="-1"></a>a <span class="ot">&lt;-</span> <span class="fl">132.2345</span></span>
<span id="cb42-3"><a href="task-01.html#cb42-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Inf</span></span>
<span id="cb42-4"><a href="task-01.html#cb42-4" aria-hidden="true" tabindex="-1"></a><span class="co"># integer</span></span>
<span id="cb42-5"><a href="task-01.html#cb42-5" aria-hidden="true" tabindex="-1"></a>b <span class="ot">&lt;-</span> 132L</span>
<span id="cb42-6"><a href="task-01.html#cb42-6" aria-hidden="true" tabindex="-1"></a><span class="co"># complex</span></span>
<span id="cb42-7"><a href="task-01.html#cb42-7" aria-hidden="true" tabindex="-1"></a>c <span class="ot">&lt;-</span> <span class="dv">2</span> <span class="sc">+</span> 3i</span></code></pre></div>
<p>实数数值型数据每个值占用8个字节（bytes），是最常见的数值型数据。如果没有做特别处理，我们平时见到的数字都是这个类型的——单纯的数字罢了。</p>
<p>整数类型，正如它的名字一样，只包含整数而没有小数部分。我们可以在整数末尾加上一个大写的L来表示这个数字是一个整数类型的数据。如果没有加大写的L的话，虽然只输入了一个整数，但是这个整数是实数数值类型的整数，而不是整数类型。他们的区别在于实数数值类型的整数和和非整数一样都占用8个字节，而整数类型只占用4个字节。平时用起来区别不大，但是如果数据量比较大且都是整数的话推荐使用整数类型来节约空间。</p>
<p>复数类型便是包含复数部分的数值类型了。只要在实数部分后边加上虚数部分并且用小写字母i来代表虚数单位，这个数值便是复数类型。鉴于数据分析领域基本不会涉及复数，我们在这次组队学习不去讨论复数类型。</p>
<p>判断一个数值是什么类型，可以用<code>typeof()</code>:</p>
<div class="sourceCode" id="cb43"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb43-1"><a href="task-01.html#cb43-1" aria-hidden="true" tabindex="-1"></a><span class="fu">typeof</span>(a)</span></code></pre></div>
<pre><code>## [1] &quot;double&quot;</code></pre>
<div class="sourceCode" id="cb45"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb45-1"><a href="task-01.html#cb45-1" aria-hidden="true" tabindex="-1"></a><span class="fu">typeof</span>(b)</span></code></pre></div>
<pre><code>## [1] &quot;integer&quot;</code></pre>
<div class="sourceCode" id="cb47"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb47-1"><a href="task-01.html#cb47-1" aria-hidden="true" tabindex="-1"></a><span class="fu">typeof</span>(c)</span></code></pre></div>
<pre><code>## [1] &quot;complex&quot;</code></pre>
</div>
<div id="逻辑型" class="section level4" number="1.3.1.2">
<h4><span class="header-section-number">1.3.1.2</span> 逻辑型</h4>
<p>逻辑型（logical）数据只包括两个值，<code>TRUE</code>（<code>T</code>） 和 <code>FALSE</code>（<code>F</code>）:</p>
<div class="sourceCode" id="cb49"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb49-1"><a href="task-01.html#cb49-1" aria-hidden="true" tabindex="-1"></a><span class="cn">TRUE</span></span></code></pre></div>
<pre><code>## [1] TRUE</code></pre>
<div class="sourceCode" id="cb51"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb51-1"><a href="task-01.html#cb51-1" aria-hidden="true" tabindex="-1"></a>T</span></code></pre></div>
<pre><code>## [1] TRUE</code></pre>
<div class="sourceCode" id="cb53"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb53-1"><a href="task-01.html#cb53-1" aria-hidden="true" tabindex="-1"></a><span class="cn">FALSE</span></span></code></pre></div>
<pre><code>## [1] FALSE</code></pre>
<div class="sourceCode" id="cb55"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb55-1"><a href="task-01.html#cb55-1" aria-hidden="true" tabindex="-1"></a>F</span></code></pre></div>
<pre><code>## [1] FALSE</code></pre>
<p>尽管一个字母的缩写和全拼效果是一样的，但是一个好的编程习惯是始终使用大写全拼的<code>TRUE</code>和<code>FALSE</code>。这样可以增加可读性，也会减小因为命名产生的使用错误。比如，有些时候涉及到时间序列时，一些用户喜欢将最大时序上限命名为<code>T</code>，这个时候就不能用<code>T</code>来代表<code>TRUE</code>了。</p>
<p>说到逻辑型数据，就不得不说到逻辑算符。这里我们只考虑三个，分别是“和”（and）<code>&amp;</code>、“或”（or）<code>|</code>、“否”（not）<code>!</code>。</p>
<div class="sourceCode" id="cb57"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb57-1"><a href="task-01.html#cb57-1" aria-hidden="true" tabindex="-1"></a><span class="cn">TRUE</span> <span class="sc">&amp;</span> <span class="cn">FALSE</span></span></code></pre></div>
<pre><code>## [1] FALSE</code></pre>
<div class="sourceCode" id="cb59"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb59-1"><a href="task-01.html#cb59-1" aria-hidden="true" tabindex="-1"></a><span class="cn">TRUE</span> <span class="sc">|</span> <span class="cn">FALSE</span></span></code></pre></div>
<pre><code>## [1] TRUE</code></pre>
<div class="sourceCode" id="cb61"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb61-1"><a href="task-01.html#cb61-1" aria-hidden="true" tabindex="-1"></a><span class="sc">!</span><span class="cn">TRUE</span></span></code></pre></div>
<pre><code>## [1] FALSE</code></pre>
</div>
<div id="字符型" class="section level4" number="1.3.1.3">
<h4><span class="header-section-number">1.3.1.3</span> 字符型</h4>
<p>字符型数据（character）可以总结为“任何带引号的值”。它可以是一个字母、一个单词、一句话、或者任何用引号框起来的数值或逻辑。</p>
<div class="sourceCode" id="cb63"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb63-1"><a href="task-01.html#cb63-1" aria-hidden="true" tabindex="-1"></a>string_a <span class="ot">&lt;-</span> <span class="st">&quot;A&quot;</span></span>
<span id="cb63-2"><a href="task-01.html#cb63-2" aria-hidden="true" tabindex="-1"></a>string_b <span class="ot">&lt;-</span> <span class="st">&quot;letter&quot;</span></span>
<span id="cb63-3"><a href="task-01.html#cb63-3" aria-hidden="true" tabindex="-1"></a>string_c <span class="ot">&lt;-</span> <span class="st">&quot;This is a sentence.&quot;</span></span>
<span id="cb63-4"><a href="task-01.html#cb63-4" aria-hidden="true" tabindex="-1"></a>string_d <span class="ot">&lt;-</span> <span class="st">&quot;42&quot;</span></span>
<span id="cb63-5"><a href="task-01.html#cb63-5" aria-hidden="true" tabindex="-1"></a>string_e <span class="ot">&lt;-</span> <span class="st">&quot;TRUE&quot;</span></span></code></pre></div>
<p>在输入的时候，即使是数字或者逻辑型的<code>TRUE</code>和<code>FALSE</code>，只要加上了引号，他们就变成了字符型的数据，而不再带有数值型或逻辑型的特性。要注意区分。</p>
<div class="sourceCode" id="cb64"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb64-1"><a href="task-01.html#cb64-1" aria-hidden="true" tabindex="-1"></a><span class="fu">typeof</span>(<span class="dv">42</span>)</span></code></pre></div>
<pre><code>## [1] &quot;double&quot;</code></pre>
<div class="sourceCode" id="cb66"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb66-1"><a href="task-01.html#cb66-1" aria-hidden="true" tabindex="-1"></a><span class="fu">typeof</span>(<span class="st">&quot;42&quot;</span>)</span></code></pre></div>
<pre><code>## [1] &quot;character&quot;</code></pre>
<div class="sourceCode" id="cb68"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb68-1"><a href="task-01.html#cb68-1" aria-hidden="true" tabindex="-1"></a><span class="fu">typeof</span>(<span class="cn">TRUE</span>)</span></code></pre></div>
<pre><code>## [1] &quot;logical&quot;</code></pre>
<div class="sourceCode" id="cb70"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb70-1"><a href="task-01.html#cb70-1" aria-hidden="true" tabindex="-1"></a><span class="fu">typeof</span>(<span class="st">&quot;TRUE&quot;</span>)</span></code></pre></div>
<pre><code>## [1] &quot;character&quot;</code></pre>
<p>字符型是最“自由”的数据类型，因为它的内容可以是任何字符，任何其他的数据类型也可以转化为字符型数据。比如你可以把一个数值型数据加上引号来当作字符型数据来解读。但是反过来却不可以：你没有办法把一个字符当作数字来解读。数据之间的转化我们会在第3.2.2节讲到。</p>
<p>在R中，单引号（<code>'</code>）和双引号（<code>"</code>）是等效的，但是我们推荐大多数情况下使用双引号，只有在引号内有双引号的时候使用单引号去引双引号（比如<code>' This is an "example". '</code>）。这主要是为了帮助其他语言（C, C++, Java等）的用户区分单双引号的细微区别。在C语言里，单双引号不是等效的。R语言中的（单）双引号大致是与C语言中的双引号等效的。</p>
</div>
</div>
<div id="向量vector" class="section level3" number="1.3.2">
<h3><span class="header-section-number">1.3.2</span> 向量（vector）</h3>
<p>这里说到的向量主要指基础向量类型（atomic vector）。
向量是由一组相同类型的值组成的一维序列。根据值的类型不同，我们会有不同类型的向量。
相对应之前的数值、逻辑和字符型的基础数据类型，这里我们也有数值、逻辑和字符型的向量类型。</p>
<div class="sourceCode" id="cb72"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb72-1"><a href="task-01.html#cb72-1" aria-hidden="true" tabindex="-1"></a>vec_num <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>)</span>
<span id="cb72-2"><a href="task-01.html#cb72-2" aria-hidden="true" tabindex="-1"></a>vec_log <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="cn">TRUE</span>, <span class="cn">FALSE</span>, <span class="cn">TRUE</span>)</span>
<span id="cb72-3"><a href="task-01.html#cb72-3" aria-hidden="true" tabindex="-1"></a>vec_cha <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="st">&quot;A&quot;</span>, <span class="st">&quot;B&quot;</span>, <span class="st">&quot;middle school&quot;</span>)</span></code></pre></div>
<p>使用函数<code>c()</code>来构建向量。可以进行向量上的运算，而不用一个一个值地单独去计算。</p>
<div class="sourceCode" id="cb73"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb73-1"><a href="task-01.html#cb73-1" aria-hidden="true" tabindex="-1"></a>vec_A <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>)</span>
<span id="cb73-2"><a href="task-01.html#cb73-2" aria-hidden="true" tabindex="-1"></a>vec_B <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="dv">3</span>, <span class="dv">5</span>, <span class="dv">6</span>)</span>
<span id="cb73-3"><a href="task-01.html#cb73-3" aria-hidden="true" tabindex="-1"></a>vec_A <span class="sc">+</span> vec_B <span class="co"># 等同于 c(1 + 3, 2 + 5, 3 + 6)</span></span></code></pre></div>
<pre><code>## [1] 4 7 9</code></pre>
<div class="sourceCode" id="cb75"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb75-1"><a href="task-01.html#cb75-1" aria-hidden="true" tabindex="-1"></a><span class="sc">!</span>vec_log</span></code></pre></div>
<pre><code>## [1] FALSE  TRUE FALSE</code></pre>
<p>也有相应的作用于向量上的函数，可以计算相应的统计量。比如求和的<code>sum</code>、求方差的<code>var</code>、平均值的<code>mean</code>等：</p>
<div class="sourceCode" id="cb77"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb77-1"><a href="task-01.html#cb77-1" aria-hidden="true" tabindex="-1"></a><span class="fu">sum</span>(vec_A)</span></code></pre></div>
<pre><code>## [1] 6</code></pre>
<div class="sourceCode" id="cb79"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb79-1"><a href="task-01.html#cb79-1" aria-hidden="true" tabindex="-1"></a><span class="fu">var</span>(vec_A)</span></code></pre></div>
<pre><code>## [1] 1</code></pre>
<div class="sourceCode" id="cb81"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb81-1"><a href="task-01.html#cb81-1" aria-hidden="true" tabindex="-1"></a><span class="fu">mean</span>(vec_A)</span></code></pre></div>
<pre><code>## [1] 2</code></pre>
<div id="因子factor" class="section level4" number="1.3.2.1">
<h4><span class="header-section-number">1.3.2.1</span> 因子（factor）</h4>
<p>除了之前提到的基础数据类型组成的向量外，还有一类重要的的向量类型便是因子，可以使用函数<code>factor</code>和<code>c</code>组合来创建。</p>
<div class="sourceCode" id="cb83"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb83-1"><a href="task-01.html#cb83-1" aria-hidden="true" tabindex="-1"></a>vec_fac <span class="ot">&lt;-</span> <span class="fu">factor</span>(<span class="fu">c</span>(<span class="st">&quot;male&quot;</span>, <span class="st">&quot;female&quot;</span>, <span class="st">&quot;male&quot;</span>, <span class="st">&quot;female&quot;</span>, <span class="st">&quot;female&quot;</span>))</span>
<span id="cb83-2"><a href="task-01.html#cb83-2" aria-hidden="true" tabindex="-1"></a>vec_fac</span></code></pre></div>
<pre><code>## [1] male   female male   female female
## Levels: female male</code></pre>
<p>从表面上看，一个因子向量和字符向量很相似，都是一系列带引号的字符组成的。它与字符向量的主要区别在于因子向量的独特值（levels）是有限个数的。因子向量的所有元素都是由这些有限个数的独特值组成的。比如在以上的例子中，虽然<code>vec_fac</code>由五个元素组成，但是只包括了两个独特值“male”和“female”。</p>
<div class="sourceCode" id="cb85"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb85-1"><a href="task-01.html#cb85-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 查看因子向量的独特值</span></span>
<span id="cb85-2"><a href="task-01.html#cb85-2" aria-hidden="true" tabindex="-1"></a><span class="fu">levels</span>(vec_fac)</span></code></pre></div>
<pre><code>## [1] &quot;female&quot; &quot;male&quot;</code></pre>
<p>你也可以用函数<code>ordered</code>或者<code>factor</code>里的<code>ordered = TRUE</code>参数（argument）创造一个有内在顺序的因子向量，内在顺序可以用<code>levels</code>参数来手动设定：</p>
<div class="sourceCode" id="cb87"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb87-1"><a href="task-01.html#cb87-1" aria-hidden="true" tabindex="-1"></a>educ <span class="ot">&lt;-</span> <span class="fu">ordered</span>(</span>
<span id="cb87-2"><a href="task-01.html#cb87-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">c</span>(<span class="st">&quot;kindergarten&quot;</span>, <span class="st">&quot;primary school&quot;</span>, <span class="st">&quot;middle school&quot;</span>, </span>
<span id="cb87-3"><a href="task-01.html#cb87-3" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;primary school&quot;</span>, <span class="st">&quot;middle school&quot;</span>, <span class="st">&quot;kindergarten&quot;</span>),</span>
<span id="cb87-4"><a href="task-01.html#cb87-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">levels =</span> <span class="fu">c</span>(<span class="st">&quot;kindergarten&quot;</span>, <span class="st">&quot;primary school&quot;</span>, <span class="st">&quot;middle school&quot;</span>)</span>
<span id="cb87-5"><a href="task-01.html#cb87-5" aria-hidden="true" tabindex="-1"></a>)</span>
<span id="cb87-6"><a href="task-01.html#cb87-6" aria-hidden="true" tabindex="-1"></a><span class="co"># 等同于</span></span>
<span id="cb87-7"><a href="task-01.html#cb87-7" aria-hidden="true" tabindex="-1"></a>educ <span class="ot">&lt;-</span> <span class="fu">factor</span>(</span>
<span id="cb87-8"><a href="task-01.html#cb87-8" aria-hidden="true" tabindex="-1"></a>  <span class="fu">c</span>(<span class="st">&quot;kindergarten&quot;</span>, <span class="st">&quot;primary school&quot;</span>, <span class="st">&quot;middle school&quot;</span>, </span>
<span id="cb87-9"><a href="task-01.html#cb87-9" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;primary school&quot;</span>, <span class="st">&quot;middle school&quot;</span>, <span class="st">&quot;kindergarten&quot;</span>),</span>
<span id="cb87-10"><a href="task-01.html#cb87-10" aria-hidden="true" tabindex="-1"></a>  <span class="at">ordered =</span> <span class="cn">TRUE</span>, </span>
<span id="cb87-11"><a href="task-01.html#cb87-11" aria-hidden="true" tabindex="-1"></a>  <span class="at">levels =</span> <span class="fu">c</span>(<span class="st">&quot;kindergarten&quot;</span>, <span class="st">&quot;primary school&quot;</span>, <span class="st">&quot;middle school&quot;</span>)</span>
<span id="cb87-12"><a href="task-01.html#cb87-12" aria-hidden="true" tabindex="-1"></a>)</span>
<span id="cb87-13"><a href="task-01.html#cb87-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb87-14"><a href="task-01.html#cb87-14" aria-hidden="true" tabindex="-1"></a>educ</span></code></pre></div>
<pre><code>## [1] kindergarten   primary school middle school  primary school middle school 
## [6] kindergarten  
## Levels: kindergarten &lt; primary school &lt; middle school</code></pre>
<p>实质上，R 把因子向量当作整数型数值向量来对待。这也就意味着用因子向量替代字符向量可以剩下很多字节。</p>
</div>
<div id="transform" class="section level4" number="1.3.2.2">
<h4><span class="header-section-number">1.3.2.2</span> 数值之间的转换</h4>
<p>不同的向量/数据类型之间是可以互相转换的。相互转换的可行性取决于数据类型的复杂程度（或者说自由程度）。按照自由程度将已经提到的几种向量以从高到低的排序可得</p>
<blockquote>
<p>字符&gt;数值&gt;逻辑</p>
</blockquote>
<p>在数值型内的排序从自由度高到低为</p>
<blockquote>
<p>复数&gt;实数&gt;整数</p>
</blockquote>
<p>越靠近字符的类型越“自由”，自由度低的类型可以（随意）转化为同层或自由度更高的类型。字符型向量是最自由的：它可以包含任何原始值，其他任何类型都可以转化为它。我们以一个最受限制的逻辑向量为例，在这里展示如何根据这个排序使用几个常见的类型转换函数：</p>
<div class="sourceCode" id="cb89"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb89-1"><a href="task-01.html#cb89-1" aria-hidden="true" tabindex="-1"></a>vec_loc <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="cn">TRUE</span>, <span class="cn">FALSE</span>, <span class="cn">TRUE</span>)</span>
<span id="cb89-2"><a href="task-01.html#cb89-2" aria-hidden="true" tabindex="-1"></a><span class="co"># 从逻辑型到数值型</span></span>
<span id="cb89-3"><a href="task-01.html#cb89-3" aria-hidden="true" tabindex="-1"></a>vec_num <span class="ot">&lt;-</span> <span class="fu">as.numeric</span>(vec_loc)</span>
<span id="cb89-4"><a href="task-01.html#cb89-4" aria-hidden="true" tabindex="-1"></a>vec_num</span></code></pre></div>
<pre><code>## [1] 1 0 1</code></pre>
<div class="sourceCode" id="cb91"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb91-1"><a href="task-01.html#cb91-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 从数值型到字符型</span></span>
<span id="cb91-2"><a href="task-01.html#cb91-2" aria-hidden="true" tabindex="-1"></a>vec_cha <span class="ot">&lt;-</span> <span class="fu">as.character</span>(vec_num)</span>
<span id="cb91-3"><a href="task-01.html#cb91-3" aria-hidden="true" tabindex="-1"></a>vec_cha</span></code></pre></div>
<pre><code>## [1] &quot;1&quot; &quot;0&quot; &quot;1&quot;</code></pre>
<div class="sourceCode" id="cb93"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb93-1"><a href="task-01.html#cb93-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 从逻辑型到字符型</span></span>
<span id="cb93-2"><a href="task-01.html#cb93-2" aria-hidden="true" tabindex="-1"></a>vec_cha2 <span class="ot">&lt;-</span> <span class="fu">as.character</span>(vec_loc)</span>
<span id="cb93-3"><a href="task-01.html#cb93-3" aria-hidden="true" tabindex="-1"></a>vec_cha2</span></code></pre></div>
<pre><code>## [1] &quot;TRUE&quot;  &quot;FALSE&quot; &quot;TRUE&quot;</code></pre>
<div class="sourceCode" id="cb95"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb95-1"><a href="task-01.html#cb95-1" aria-hidden="true" tabindex="-1"></a><span class="do">## 倒序</span></span>
<span id="cb95-2"><a href="task-01.html#cb95-2" aria-hidden="true" tabindex="-1"></a><span class="co"># 从字符型到数值型</span></span>
<span id="cb95-3"><a href="task-01.html#cb95-3" aria-hidden="true" tabindex="-1"></a><span class="fu">as.numeric</span>(vec_cha)</span></code></pre></div>
<pre><code>## [1] 1 0 1</code></pre>
<div class="sourceCode" id="cb97"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb97-1"><a href="task-01.html#cb97-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 从字符型到逻辑型</span></span>
<span id="cb97-2"><a href="task-01.html#cb97-2" aria-hidden="true" tabindex="-1"></a><span class="fu">as.logical</span>(vec_cha2)</span></code></pre></div>
<pre><code>## [1]  TRUE FALSE  TRUE</code></pre>
<div class="sourceCode" id="cb99"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb99-1"><a href="task-01.html#cb99-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 从数值型到逻辑型</span></span>
<span id="cb99-2"><a href="task-01.html#cb99-2" aria-hidden="true" tabindex="-1"></a><span class="fu">as.logical</span>(vec_num)</span></code></pre></div>
<pre><code>## [1]  TRUE FALSE  TRUE</code></pre>
<p>这里我们可以看到逻辑型的<code>TRUE</code>和<code>FALSE</code>实际上对应数值型的1和0。</p>
<p>从一个低自由的类型可以随便转化到高自由的类型，但是反过来，从一个高自由的类型要转化到一个低自由的类型必须要符合一些特定值。比如：</p>
<ol style="list-style-type: decimal">
<li>从字符型转化到数值型的时候，字符的值一定要符合数字的格式；</li>
<li>从数值型转化到逻辑型，0会转化为<code>FALSE</code>，其他数值会转化为<code>TRUE</code>；</li>
<li>从字符型转化到逻辑型，字符的值只能是<code>TRUE</code>和<code>FALSE</code>。</li>
</ol>
<p>如果不符合这个规则的话，会得到<code>NA</code>。<code>NA</code>是“Not Available”的缩写，即所谓的缺失值。缺失值的处理在下一篇《数据清洗与准备》会讲到。</p>
<div class="sourceCode" id="cb101"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb101-1"><a href="task-01.html#cb101-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 产生缺失值</span></span>
<span id="cb101-2"><a href="task-01.html#cb101-2" aria-hidden="true" tabindex="-1"></a><span class="fu">as.logical</span>(<span class="fu">c</span>(<span class="st">&quot;some&quot;</span>, <span class="st">&quot;random&quot;</span>, <span class="st">&quot;strings&quot;</span>))</span></code></pre></div>
<pre><code>## [1] NA NA NA</code></pre>
<p>因子型是一个相对特殊的类型，它可以和数值型与字符型相互转换。</p>
<div class="sourceCode" id="cb103"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb103-1"><a href="task-01.html#cb103-1" aria-hidden="true" tabindex="-1"></a>vec_fac <span class="ot">&lt;-</span> <span class="fu">factor</span>(<span class="fu">c</span>(<span class="st">&quot;male&quot;</span>, <span class="st">&quot;female&quot;</span>, <span class="st">&quot;male&quot;</span>, <span class="st">&quot;female&quot;</span>, <span class="st">&quot;female&quot;</span>))</span>
<span id="cb103-2"><a href="task-01.html#cb103-2" aria-hidden="true" tabindex="-1"></a><span class="co"># 从因子型到数值型</span></span>
<span id="cb103-3"><a href="task-01.html#cb103-3" aria-hidden="true" tabindex="-1"></a>vec_num <span class="ot">&lt;-</span> <span class="fu">as.numeric</span>(vec_fac)</span>
<span id="cb103-4"><a href="task-01.html#cb103-4" aria-hidden="true" tabindex="-1"></a>vec_num</span></code></pre></div>
<pre><code>## [1] 2 1 2 1 1</code></pre>
<div class="sourceCode" id="cb105"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb105-1"><a href="task-01.html#cb105-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 从因子型到字符型</span></span>
<span id="cb105-2"><a href="task-01.html#cb105-2" aria-hidden="true" tabindex="-1"></a>vec_cha <span class="ot">&lt;-</span> <span class="fu">as.character</span>(vec_fac)</span>
<span id="cb105-3"><a href="task-01.html#cb105-3" aria-hidden="true" tabindex="-1"></a>vec_cha</span></code></pre></div>
<pre><code>## [1] &quot;male&quot;   &quot;female&quot; &quot;male&quot;   &quot;female&quot; &quot;female&quot;</code></pre>
<div class="sourceCode" id="cb107"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb107-1"><a href="task-01.html#cb107-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 从字符型到因子型</span></span>
<span id="cb107-2"><a href="task-01.html#cb107-2" aria-hidden="true" tabindex="-1"></a><span class="fu">as.factor</span>(vec_cha)</span></code></pre></div>
<pre><code>## [1] male   female male   female female
## Levels: female male</code></pre>
<div class="sourceCode" id="cb109"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb109-1"><a href="task-01.html#cb109-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 从整数型到字符型</span></span>
<span id="cb109-2"><a href="task-01.html#cb109-2" aria-hidden="true" tabindex="-1"></a><span class="fu">as.factor</span>(vec_num)</span></code></pre></div>
<pre><code>## [1] 2 1 2 1 1
## Levels: 1 2</code></pre>
<p>正如之前所说，R内部将因子变量当作整数变量来处理，这也就是为什么一个看上去像是字符的东西可以被变成数值。
需要注意的是，把因子型转化为其他类型的时候会丢失一定的信息：</p>
<ol style="list-style-type: decimal">
<li>因子向量变成字符向量会丢失独特值的信息；</li>
<li>因子向量变成数值型的时候会丢失字面信息，只会保留独特值的编码，即根据独特值排序的正整数。</li>
</ol>
</div>
<div id="向量命名" class="section level4" number="1.3.2.3">
<h4><span class="header-section-number">1.3.2.3</span> 向量命名</h4>
<p>除了向量自己的名字，我们也可以给向量里的每个元素一个名字。</p>
<div class="sourceCode" id="cb111"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb111-1"><a href="task-01.html#cb111-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 先命名向量</span></span>
<span id="cb111-2"><a href="task-01.html#cb111-2" aria-hidden="true" tabindex="-1"></a><span class="co"># 再命名向量的元素</span></span>
<span id="cb111-3"><a href="task-01.html#cb111-3" aria-hidden="true" tabindex="-1"></a>vec <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>)</span>
<span id="cb111-4"><a href="task-01.html#cb111-4" aria-hidden="true" tabindex="-1"></a><span class="fu">names</span>(vec) <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="st">&quot;A&quot;</span>, <span class="st">&quot;B&quot;</span>, <span class="st">&quot;C&quot;</span>)</span>
<span id="cb111-5"><a href="task-01.html#cb111-5" aria-hidden="true" tabindex="-1"></a>vec</span></code></pre></div>
<pre><code>## A B C 
## 1 2 3</code></pre>
<div class="sourceCode" id="cb113"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb113-1"><a href="task-01.html#cb113-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 或者</span></span>
<span id="cb113-2"><a href="task-01.html#cb113-2" aria-hidden="true" tabindex="-1"></a><span class="co"># 创造向量的时候命名向量的元素</span></span>
<span id="cb113-3"><a href="task-01.html#cb113-3" aria-hidden="true" tabindex="-1"></a>vec <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="at">A =</span> <span class="dv">1</span>, <span class="at">B =</span> <span class="dv">2</span>, <span class="at">C =</span> <span class="dv">3</span>)</span>
<span id="cb113-4"><a href="task-01.html#cb113-4" aria-hidden="true" tabindex="-1"></a>vec</span></code></pre></div>
<pre><code>## A B C 
## 1 2 3</code></pre>
</div>
<div id="访问向量的子集" class="section level4" number="1.3.2.4">
<h4><span class="header-section-number">1.3.2.4</span> 访问向量的子集</h4>
<p>三种截取子集的符号：<code>[</code>、<code>[[</code> 和 <code>$</code>（其中<code>$</code>不能用在基础向量上）。</p>
<p>六种截取向量子集的方法：</p>
<ol style="list-style-type: decimal">
<li>正整数：根据元素的序号提取元素；</li>
<li>负整数：根据元素的序号去除元素；</li>
<li>和向量长度一样的逻辑向量：将逻辑向量的元素与向量元素一一对应，<code>TRUE</code> 选择该元素，<code>FALSE</code>去除该元素；</li>
<li>Nothing：选择原向量；</li>
<li>零（0）：什么都不选择；</li>
<li>字符向量：选择根据元素名字选择元素。</li>
</ol>
<p>使用<code>[</code> 作为选取符号的示例：</p>
<div class="sourceCode" id="cb115"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb115-1"><a href="task-01.html#cb115-1" aria-hidden="true" tabindex="-1"></a>vec <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="at">a =</span> <span class="fl">1.2</span>, <span class="at">b =</span> <span class="fl">5.6</span>, <span class="at">c =</span> <span class="fl">8.4</span>, <span class="at">d =</span> <span class="fl">9.5</span>)</span>
<span id="cb115-2"><a href="task-01.html#cb115-2" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. 正整数</span></span>
<span id="cb115-3"><a href="task-01.html#cb115-3" aria-hidden="true" tabindex="-1"></a>vec[<span class="fu">c</span>(<span class="dv">1</span>,<span class="dv">3</span>)]</span></code></pre></div>
<pre><code>##   a   c 
## 1.2 8.4</code></pre>
<div class="sourceCode" id="cb117"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb117-1"><a href="task-01.html#cb117-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 2. 负整数</span></span>
<span id="cb117-2"><a href="task-01.html#cb117-2" aria-hidden="true" tabindex="-1"></a>vec[<span class="fu">c</span>(<span class="sc">-</span><span class="dv">1</span>,<span class="sc">-</span><span class="dv">3</span>)]</span></code></pre></div>
<pre><code>##   b   d 
## 5.6 9.5</code></pre>
<div class="sourceCode" id="cb119"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb119-1"><a href="task-01.html#cb119-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 3. 逻辑向量</span></span>
<span id="cb119-2"><a href="task-01.html#cb119-2" aria-hidden="true" tabindex="-1"></a>vec[<span class="fu">c</span>(<span class="cn">TRUE</span>, <span class="cn">FALSE</span>, <span class="cn">FALSE</span>, <span class="cn">TRUE</span>)]</span></code></pre></div>
<pre><code>##   a   d 
## 1.2 9.5</code></pre>
<div class="sourceCode" id="cb121"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb121-1"><a href="task-01.html#cb121-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 4. Nothing</span></span>
<span id="cb121-2"><a href="task-01.html#cb121-2" aria-hidden="true" tabindex="-1"></a>vec[]</span></code></pre></div>
<pre><code>##   a   b   c   d 
## 1.2 5.6 8.4 9.5</code></pre>
<div class="sourceCode" id="cb123"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb123-1"><a href="task-01.html#cb123-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 5. 零</span></span>
<span id="cb123-2"><a href="task-01.html#cb123-2" aria-hidden="true" tabindex="-1"></a>vec[<span class="dv">0</span>]</span></code></pre></div>
<pre><code>## named numeric(0)</code></pre>
<div class="sourceCode" id="cb125"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb125-1"><a href="task-01.html#cb125-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 6. 字符向量</span></span>
<span id="cb125-2"><a href="task-01.html#cb125-2" aria-hidden="true" tabindex="-1"></a>vec[<span class="fu">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;c&quot;</span>)]</span></code></pre></div>
<pre><code>##   a   c 
## 1.2 8.4</code></pre>
<p><code>[[</code> 在向量的场景里只能选择一个元素，而不是像<code>[</code> 一样选择一个子集：</p>
<div class="sourceCode" id="cb127"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb127-1"><a href="task-01.html#cb127-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 可以</span></span>
<span id="cb127-2"><a href="task-01.html#cb127-2" aria-hidden="true" tabindex="-1"></a>vec[[<span class="dv">1</span>]]</span></code></pre></div>
<pre><code>## [1] 1.2</code></pre>
<div class="sourceCode" id="cb129"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb129-1"><a href="task-01.html#cb129-1" aria-hidden="true" tabindex="-1"></a>vec[<span class="dv">1</span>]</span></code></pre></div>
<pre><code>##   a 
## 1.2</code></pre>
<div class="sourceCode" id="cb131"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb131-1"><a href="task-01.html#cb131-1" aria-hidden="true" tabindex="-1"></a>vec[<span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">3</span>)]</span></code></pre></div>
<pre><code>##   a   c 
## 1.2 8.4</code></pre>
<div class="sourceCode" id="cb133"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb133-1"><a href="task-01.html#cb133-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 不可以</span></span>
<span id="cb133-2"><a href="task-01.html#cb133-2" aria-hidden="true" tabindex="-1"></a>vec[[<span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">3</span>)]]</span></code></pre></div>
<pre><code>## Error in vec[[c(1, 3)]]: attempt to select more than one element in vectorIndex</code></pre>
<p>正是因为这个原因，我们提倡在只选择一个元素的时候多使用<code>[[</code>而不是<code>[</code>。这样在函数产生预期外的行为，选择多余一个元素的时候可以及时被错误信息提醒。</p>
</div>
</div>
<div id="特殊数据类型" class="section level3" number="1.3.3">
<h3><span class="header-section-number">1.3.3</span> 特殊数据类型</h3>
<div id="日期date" class="section level4" number="1.3.3.1">
<h4><span class="header-section-number">1.3.3.1</span> 日期（date）</h4>
<p>R中有蕴含日期的特殊类型<code>Date</code>，有日期-时间类型的<code>POSIXct</code>和<code>POSIXlt</code>。但在这一节我主要想介绍一下专注于日期处理的包<code>lubridate</code>。</p>
<div class="sourceCode" id="cb135"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb135-1"><a href="task-01.html#cb135-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(lubridate)</span></code></pre></div>
<p>日期的本质实质上只是数字罢了，但是日期也有特殊的计算方式，特殊的进制。比如一个月有可能有30天或31一天，多少天进一个月也需要相应变化。<code>lubridate</code>包中的年月日<code>ymd</code>函数就是用来帮助解决这个问题的：</p>
<div class="sourceCode" id="cb136"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb136-1"><a href="task-01.html#cb136-1" aria-hidden="true" tabindex="-1"></a>sevenseven <span class="ot">&lt;-</span> <span class="fu">ymd</span>(<span class="st">&quot;2021-07-07&quot;</span>) </span>
<span id="cb136-2"><a href="task-01.html#cb136-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb136-3"><a href="task-01.html#cb136-3" aria-hidden="true" tabindex="-1"></a>sevenseven</span></code></pre></div>
<pre><code>## [1] &quot;2021-07-07&quot;</code></pre>
<div class="sourceCode" id="cb138"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb138-1"><a href="task-01.html#cb138-1" aria-hidden="true" tabindex="-1"></a><span class="fu">typeof</span>(sevenseven)</span></code></pre></div>
<pre><code>## [1] &quot;double&quot;</code></pre>
<div class="sourceCode" id="cb140"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb140-1"><a href="task-01.html#cb140-1" aria-hidden="true" tabindex="-1"></a><span class="fu">class</span>(sevenseven)</span></code></pre></div>
<pre><code>## [1] &quot;Date&quot;</code></pre>
<p>注意这里打印出来的日期是符合阅读习惯的年月日，但是属于<code>Date</code>的class，又是<code>double</code>的类别，也就意味着可以把这个日期当作一个单纯的数来计算。比如七月七日加上一天就是七月八日：</p>
<div class="sourceCode" id="cb142"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb142-1"><a href="task-01.html#cb142-1" aria-hidden="true" tabindex="-1"></a>sevenseven <span class="sc">+</span> <span class="dv">1</span></span></code></pre></div>
<pre><code>## [1] &quot;2021-07-08&quot;</code></pre>
<p>七月七日加上一个月就是八月七日：</p>
<div class="sourceCode" id="cb144"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb144-1"><a href="task-01.html#cb144-1" aria-hidden="true" tabindex="-1"></a>sevenseven <span class="sc">+</span> <span class="fu">months</span>(<span class="dv">1</span>)</span></code></pre></div>
<pre><code>## [1] &quot;2021-08-07&quot;</code></pre>
<p>年月日<code>ymd</code>函数所做的只是把输入的字符串自动识别并输出日期格式的数值，只要输入的字符串符合“年月日”的类似格式顺序。如果字符串不是年月日的格式，也没关系，<code>lubridate</code>也提供相应的 月年日<code>myd</code>，日月年<code>dmy</code>，月日年<code>mdy</code>，日年月<code>dym</code>，甚至是 年季<code>yq</code> 的函数。</p>
<p><code>lubridate</code>的更多用法详见<a href="https://lubridate.tidyverse.org/"><code>lubridate</code>主页</a>。</p>
</div>
<div id="时间序列ts" class="section level4" number="1.3.3.2">
<h4><span class="header-section-number">1.3.3.2</span> 时间序列（ts）</h4>
<p>时间序列作为一种有自相关性质的特殊数据类型，在R中也是可以分开处理的。制造时间序列的函数叫做<code>ts</code>，也就是 time series 的缩写：</p>
<div class="sourceCode" id="cb146"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb146-1"><a href="task-01.html#cb146-1" aria-hidden="true" tabindex="-1"></a>xts <span class="ot">&lt;-</span> <span class="fu">ts</span>(<span class="fu">rnorm</span>(<span class="dv">12</span>), <span class="at">start =</span> <span class="fu">c</span>(<span class="dv">2021</span>, <span class="dv">1</span>), <span class="at">frequency =</span> <span class="dv">4</span>)</span>
<span id="cb146-2"><a href="task-01.html#cb146-2" aria-hidden="true" tabindex="-1"></a>xts</span></code></pre></div>
<pre><code>##             Qtr1        Qtr2        Qtr3        Qtr4
## 2021  1.09045039  0.08268216  0.20798636  1.30065672
## 2022 -0.47836458  1.34891625  1.22755628  0.43274872
## 2023  0.49595793  2.08228445  0.52394276  0.08736316</code></pre>
<p>在这里创造的序列便拥有了时间序列的性质。<code>ts</code>函数的<code>start</code>参数设定了时间序列开始的时间，<code>frequency</code>参数设定了时间序列的周期性。在上面的例子中，我们创造了一个从2021年第一季度开始的，具有季节性的时间序列，跨度三年。我们也有相应的函数可以提取这些时间序列的信息：</p>
<div class="sourceCode" id="cb148"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb148-1"><a href="task-01.html#cb148-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 起始日期</span></span>
<span id="cb148-2"><a href="task-01.html#cb148-2" aria-hidden="true" tabindex="-1"></a><span class="fu">start</span>(xts)</span></code></pre></div>
<pre><code>## [1] 2021    1</code></pre>
<div class="sourceCode" id="cb150"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb150-1"><a href="task-01.html#cb150-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 结束日期</span></span>
<span id="cb150-2"><a href="task-01.html#cb150-2" aria-hidden="true" tabindex="-1"></a><span class="fu">end</span>(xts)</span></code></pre></div>
<pre><code>## [1] 2023    4</code></pre>
<div class="sourceCode" id="cb152"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb152-1"><a href="task-01.html#cb152-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 周期性</span></span>
<span id="cb152-2"><a href="task-01.html#cb152-2" aria-hidden="true" tabindex="-1"></a><span class="fu">frequency</span>(xts)</span></code></pre></div>
<pre><code>## [1] 4</code></pre>
<p>使用时间序列的好处在于我们可以用一些很简单的命令来使用时间序列的模型，比如使用<code>forecast</code>包来用一个 ARIMA 模型对澳大利亚燃气月生产量进行预测：</p>
<div class="sourceCode" id="cb154"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb154-1"><a href="task-01.html#cb154-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(forecast)</span>
<span id="cb154-2"><a href="task-01.html#cb154-2" aria-hidden="true" tabindex="-1"></a>gas <span class="sc">%&gt;%</span> </span>
<span id="cb154-3"><a href="task-01.html#cb154-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">auto.arima</span>() <span class="sc">%&gt;%</span> </span>
<span id="cb154-4"><a href="task-01.html#cb154-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">forecast</span>(<span class="dv">36</span>) <span class="sc">%&gt;%</span> </span>
<span id="cb154-5"><a href="task-01.html#cb154-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">autoplot</span>()</span></code></pre></div>
<p><img src="RLearning_files/figure-html/unnamed-chunk-44-1.png" width="672" /></p>
<p>关于时间序列的分析与预测的更多信息可见<a href="https://tidyverts.org/"><code>tidyverts</code>系列包</a>， <a href="https://pkg.robjhyndman.com/forecast/"><code>forecast</code>包</a>等。</p>
</div>
</div>
</div>
<div id="多维数据类型" class="section level2" number="1.4">
<h2><span class="header-section-number">1.4</span> 多维数据类型</h2>
<p>之前我们讨论的数据类型都是一个序列（向量），都是一维的数据。在这章里我们会学习二维甚至多于二维的数据类型。</p>
<div id="矩阵matrix" class="section level3" number="1.4.1">
<h3><span class="header-section-number">1.4.1</span> 矩阵（matrix）</h3>
<p>在R中的矩阵和数学概念上的矩阵很相似。在数学概念里，矩阵是一个按照长方阵列排列的数字集合，它有着固定的行数和列数。在R里，矩阵是一个按照长方阵列排列的、有着固定行数和列数的、包含同一类型数据的集合。你可以使用函数<code>matrix</code>来创建一个矩阵：</p>
<div class="sourceCode" id="cb155"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb155-1"><a href="task-01.html#cb155-1" aria-hidden="true" tabindex="-1"></a><span class="fu">matrix</span>(<span class="dv">1</span><span class="sc">:</span><span class="dv">9</span>, <span class="at">nrow =</span> <span class="dv">3</span>)</span></code></pre></div>
<pre><code>##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9</code></pre>
<p>在这里，第一个参数是矩阵中数据的具体内容。<code>1:9</code> 是 <code>c(1, 2, 3, 4, 5, 6, 7, 8, 9)</code> 的一个缩写，用于创建间隔为1的整数序列。</p>
<p>第二个参数告诉R这个矩阵应该有多少行。你也可以使用<code>ncol</code>来告诉R这个矩阵有多少列。默认状态下，R会把数值按照从上到下、从左到右的顺序填充在这个固定行数列数的矩阵里。如果你想让R先从左到右填充（横向按照行填充），则需要将<code>byrow</code>参数设置为<code>TRUE</code>：</p>
<div class="sourceCode" id="cb157"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb157-1"><a href="task-01.html#cb157-1" aria-hidden="true" tabindex="-1"></a><span class="fu">matrix</span>(<span class="dv">1</span><span class="sc">:</span><span class="dv">9</span>, <span class="at">ncol =</span> <span class="dv">3</span>, <span class="at">byrow =</span> <span class="cn">TRUE</span>)</span></code></pre></div>
<pre><code>##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9</code></pre>
<p>R中的矩阵不局限于数值型矩阵，它只要求包含的数据从属于同一类型：如果是数值型，那每一个格子里都是数值型；如果是字符型，所有值都是字符型数据。</p>
<div class="sourceCode" id="cb159"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb159-1"><a href="task-01.html#cb159-1" aria-hidden="true" tabindex="-1"></a>mat_month <span class="ot">&lt;-</span> <span class="fu">matrix</span>(month.name, <span class="at">nrow =</span> <span class="dv">4</span>, <span class="at">byrow =</span> <span class="cn">TRUE</span>)</span>
<span id="cb159-2"><a href="task-01.html#cb159-2" aria-hidden="true" tabindex="-1"></a>mat_month</span></code></pre></div>
<pre><code>##      [,1]      [,2]       [,3]       
## [1,] &quot;January&quot; &quot;February&quot; &quot;March&quot;    
## [2,] &quot;April&quot;   &quot;May&quot;      &quot;June&quot;     
## [3,] &quot;July&quot;    &quot;August&quot;   &quot;September&quot;
## [4,] &quot;October&quot; &quot;November&quot; &quot;December&quot;</code></pre>
<div id="矩阵命名" class="section level4" number="1.4.1.1">
<h4><span class="header-section-number">1.4.1.1</span> 矩阵命名</h4>
<p>对于一个矩阵来说，主要的命名集中于行名<code>rownames</code>和列名<code>colnames</code>：</p>
<div class="sourceCode" id="cb161"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb161-1"><a href="task-01.html#cb161-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 你可以用这两个函数去更改行名和列名</span></span>
<span id="cb161-2"><a href="task-01.html#cb161-2" aria-hidden="true" tabindex="-1"></a><span class="fu">rownames</span>(mat_month) <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="st">&quot;Quarter1&quot;</span>, <span class="st">&quot;Quarter2&quot;</span>, <span class="st">&quot;Quarter3&quot;</span>, <span class="st">&quot;Quarter4&quot;</span>)</span>
<span id="cb161-3"><a href="task-01.html#cb161-3" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(mat_month) <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="st">&quot;Month1&quot;</span>, <span class="st">&quot;Month2&quot;</span>, <span class="st">&quot;Month3&quot;</span>)</span>
<span id="cb161-4"><a href="task-01.html#cb161-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-5"><a href="task-01.html#cb161-5" aria-hidden="true" tabindex="-1"></a>mat_month</span></code></pre></div>
<pre><code>##          Month1    Month2     Month3     
## Quarter1 &quot;January&quot; &quot;February&quot; &quot;March&quot;    
## Quarter2 &quot;April&quot;   &quot;May&quot;      &quot;June&quot;     
## Quarter3 &quot;July&quot;    &quot;August&quot;   &quot;September&quot;
## Quarter4 &quot;October&quot; &quot;November&quot; &quot;December&quot;</code></pre>
<div class="sourceCode" id="cb163"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb163-1"><a href="task-01.html#cb163-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 也可以用这两个函数去获取行名和列名</span></span>
<span id="cb163-2"><a href="task-01.html#cb163-2" aria-hidden="true" tabindex="-1"></a><span class="fu">rownames</span>(mat_month)  </span></code></pre></div>
<pre><code>## [1] &quot;Quarter1&quot; &quot;Quarter2&quot; &quot;Quarter3&quot; &quot;Quarter4&quot;</code></pre>
<div class="sourceCode" id="cb165"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb165-1"><a href="task-01.html#cb165-1" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(mat_month) </span></code></pre></div>
<pre><code>## [1] &quot;Month1&quot; &quot;Month2&quot; &quot;Month3&quot;</code></pre>
<div class="sourceCode" id="cb167"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb167-1"><a href="task-01.html#cb167-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 或者用一个函数获取所有维度的名称</span></span>
<span id="cb167-2"><a href="task-01.html#cb167-2" aria-hidden="true" tabindex="-1"></a><span class="fu">dimnames</span>(mat_month)</span></code></pre></div>
<pre><code>## [[1]]
## [1] &quot;Quarter1&quot; &quot;Quarter2&quot; &quot;Quarter3&quot; &quot;Quarter4&quot;
## 
## [[2]]
## [1] &quot;Month1&quot; &quot;Month2&quot; &quot;Month3&quot;</code></pre>
</div>
<div id="访问矩阵子集" class="section level4" number="1.4.1.2">
<h4><span class="header-section-number">1.4.1.2</span> 访问矩阵子集</h4>
<p>和在向量里一样，访问矩阵的子集也可以用<code>[</code>或者<code>[[</code>。区别在于矩阵中我们有两个维度，所以需要同时给定两个维度的坐标：</p>
<div class="sourceCode" id="cb169"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb169-1"><a href="task-01.html#cb169-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 访问矩阵中第1行第2列格子的元素</span></span>
<span id="cb169-2"><a href="task-01.html#cb169-2" aria-hidden="true" tabindex="-1"></a>mat_month[[<span class="dv">1</span>, <span class="dv">2</span>]]</span></code></pre></div>
<pre><code>## [1] &quot;February&quot;</code></pre>
<div class="sourceCode" id="cb171"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb171-1"><a href="task-01.html#cb171-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 在逗号前不输入数字的时候</span></span>
<span id="cb171-2"><a href="task-01.html#cb171-2" aria-hidden="true" tabindex="-1"></a><span class="co"># 根据列号截取整列</span></span>
<span id="cb171-3"><a href="task-01.html#cb171-3" aria-hidden="true" tabindex="-1"></a>mat_month[, <span class="dv">2</span>]</span></code></pre></div>
<pre><code>##   Quarter1   Quarter2   Quarter3   Quarter4 
## &quot;February&quot;      &quot;May&quot;   &quot;August&quot; &quot;November&quot;</code></pre>
<div class="sourceCode" id="cb173"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb173-1"><a href="task-01.html#cb173-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 在逗号后不输入数字的时候</span></span>
<span id="cb173-2"><a href="task-01.html#cb173-2" aria-hidden="true" tabindex="-1"></a><span class="co"># 根据行号截取整行</span></span>
<span id="cb173-3"><a href="task-01.html#cb173-3" aria-hidden="true" tabindex="-1"></a>mat_month[<span class="dv">1</span>, ]</span></code></pre></div>
<pre><code>##     Month1     Month2     Month3 
##  &quot;January&quot; &quot;February&quot;    &quot;March&quot;</code></pre>
<div class="sourceCode" id="cb175"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb175-1"><a href="task-01.html#cb175-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 如果有行名和列名的话</span></span>
<span id="cb175-2"><a href="task-01.html#cb175-2" aria-hidden="true" tabindex="-1"></a><span class="co"># 也可以用字符串来截取特定范围</span></span>
<span id="cb175-3"><a href="task-01.html#cb175-3" aria-hidden="true" tabindex="-1"></a>mat_month[[<span class="st">&quot;Quarter1&quot;</span>, <span class="st">&quot;Month3&quot;</span>]]</span></code></pre></div>
<pre><code>## [1] &quot;March&quot;</code></pre>
</div>
</div>
<div id="列表list" class="section level3" number="1.4.2">
<h3><span class="header-section-number">1.4.2</span> 列表（list）</h3>
<p>列表是R中比较基础的数据类型中最灵活的类型。它和向量或者矩阵不一样，在一个列表中可以储存各种不同的基本数据类型。你既可以存三个数字，也可以把数值型、字符型、逻辑型混合：</p>
<div class="sourceCode" id="cb177"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb177-1"><a href="task-01.html#cb177-1" aria-hidden="true" tabindex="-1"></a><span class="fu">list</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>)</span></code></pre></div>
<pre><code>## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3</code></pre>
<div class="sourceCode" id="cb179"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb179-1"><a href="task-01.html#cb179-1" aria-hidden="true" tabindex="-1"></a><span class="fu">list</span>(<span class="dv">1</span>, <span class="st">&quot;lol&quot;</span>, <span class="cn">TRUE</span>)</span></code></pre></div>
<pre><code>## [[1]]
## [1] 1
## 
## [[2]]
## [1] &quot;lol&quot;
## 
## [[3]]
## [1] TRUE</code></pre>
<p>列表甚至可以储存列表本身，也就意味着你可以一层套一层地设置列表。夹杂其他各种类型，就可以创造一个庞然大物：</p>
<div class="sourceCode" id="cb181"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb181-1"><a href="task-01.html#cb181-1" aria-hidden="true" tabindex="-1"></a>stuff <span class="ot">&lt;-</span> <span class="fu">list</span>(</span>
<span id="cb181-2"><a href="task-01.html#cb181-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">list</span>(</span>
<span id="cb181-3"><a href="task-01.html#cb181-3" aria-hidden="true" tabindex="-1"></a>    <span class="dv">1</span><span class="sc">:</span><span class="dv">12</span>, </span>
<span id="cb181-4"><a href="task-01.html#cb181-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;To be or not to be&quot;</span>, </span>
<span id="cb181-5"><a href="task-01.html#cb181-5" aria-hidden="true" tabindex="-1"></a>    <span class="fu">c</span>(<span class="cn">TRUE</span>, <span class="cn">FALSE</span>)), </span>
<span id="cb181-6"><a href="task-01.html#cb181-6" aria-hidden="true" tabindex="-1"></a>  <span class="dv">42</span>, </span>
<span id="cb181-7"><a href="task-01.html#cb181-7" aria-hidden="true" tabindex="-1"></a>  <span class="fu">list</span>(</span>
<span id="cb181-8"><a href="task-01.html#cb181-8" aria-hidden="true" tabindex="-1"></a>    <span class="fu">list</span>(</span>
<span id="cb181-9"><a href="task-01.html#cb181-9" aria-hidden="true" tabindex="-1"></a>      <span class="fu">ymd</span>(<span class="st">&quot;2021-07-07&quot;</span>), </span>
<span id="cb181-10"><a href="task-01.html#cb181-10" aria-hidden="true" tabindex="-1"></a>      <span class="st">&quot;remembrance&quot;</span>),</span>
<span id="cb181-11"><a href="task-01.html#cb181-11" aria-hidden="true" tabindex="-1"></a>    2L<span class="sc">+</span>3i) </span>
<span id="cb181-12"><a href="task-01.html#cb181-12" aria-hidden="true" tabindex="-1"></a>)</span>
<span id="cb181-13"><a href="task-01.html#cb181-13" aria-hidden="true" tabindex="-1"></a>stuff</span></code></pre></div>
<pre><code>## [[1]]
## [[1]][[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12
## 
## [[1]][[2]]
## [1] &quot;To be or not to be&quot;
## 
## [[1]][[3]]
## [1]  TRUE FALSE
## 
## 
## [[2]]
## [1] 42
## 
## [[3]]
## [[3]][[1]]
## [[3]][[1]][[1]]
## [1] &quot;2021-07-07&quot;
## 
## [[3]][[1]][[2]]
## [1] &quot;remembrance&quot;
## 
## 
## [[3]][[2]]
## [1] 2+3i</code></pre>
<div id="访问子列表" class="section level4" number="1.4.2.1">
<h4><span class="header-section-number">1.4.2.1</span> 访问子列表</h4>
<p>列表同样可以用中括号来访问子列表。单个中括号<code>[</code>和两个中括号<code>[[</code>的区分在列表中特别重要。简单来说，单个中括号返回的列表元素类型还是列表，双中括号返回的列表元素是它本身的类型。想要返回多个子列表，就只能用单括号了，因为元素本身的类型不允许多个类型在一个序列中保存。</p>
<div class="sourceCode" id="cb183"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb183-1"><a href="task-01.html#cb183-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 返回前两个子列表</span></span>
<span id="cb183-2"><a href="task-01.html#cb183-2" aria-hidden="true" tabindex="-1"></a>stuff[<span class="dv">1</span><span class="sc">:</span><span class="dv">2</span>]</span></code></pre></div>
<pre><code>## [[1]]
## [[1]][[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12
## 
## [[1]][[2]]
## [1] &quot;To be or not to be&quot;
## 
## [[1]][[3]]
## [1]  TRUE FALSE
## 
## 
## [[2]]
## [1] 42</code></pre>
<div class="sourceCode" id="cb185"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb185-1"><a href="task-01.html#cb185-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 返回第一个子列表中的第二个子列表</span></span>
<span id="cb185-2"><a href="task-01.html#cb185-2" aria-hidden="true" tabindex="-1"></a>stuff[[<span class="dv">1</span>]][[<span class="dv">2</span>]]</span></code></pre></div>
<pre><code>## [1] &quot;To be or not to be&quot;</code></pre>
</div>
<div id="列表命名" class="section level4" number="1.4.2.2">
<h4><span class="header-section-number">1.4.2.2</span> 列表命名</h4>
<p>列表的维度，或者说层数可是比矩阵多多了，这也就意味着，列表中可以命名的地方多多了。</p>
<div class="sourceCode" id="cb187"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb187-1"><a href="task-01.html#cb187-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 给列表的顶层三个列表命名</span></span>
<span id="cb187-2"><a href="task-01.html#cb187-2" aria-hidden="true" tabindex="-1"></a><span class="fu">names</span>(stuff) <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="st">&quot;I&quot;</span>, <span class="st">&quot;II&quot;</span>, <span class="st">&quot;III&quot;</span>)</span>
<span id="cb187-3"><a href="task-01.html#cb187-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-4"><a href="task-01.html#cb187-4" aria-hidden="true" tabindex="-1"></a><span class="co"># 给列表的第一个列表里的三个子列表命名</span></span>
<span id="cb187-5"><a href="task-01.html#cb187-5" aria-hidden="true" tabindex="-1"></a><span class="fu">names</span>(stuff[[<span class="dv">1</span>]]) <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="st">&quot;I&quot;</span>, <span class="st">&quot;II&quot;</span>, <span class="st">&quot;III&quot;</span>)</span></code></pre></div>
<p>如果列表有名字，自然可以用包含名字的字符串获取子列表。</p>
<div class="sourceCode" id="cb188"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb188-1"><a href="task-01.html#cb188-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 访问名为“I”的列表中名为“II”的子列表</span></span>
<span id="cb188-2"><a href="task-01.html#cb188-2" aria-hidden="true" tabindex="-1"></a>stuff[[<span class="st">&quot;I&quot;</span>]][[<span class="st">&quot;II&quot;</span>]]</span></code></pre></div>
<pre><code>## [1] &quot;To be or not to be&quot;</code></pre>
<p>之前我们一直没有使用过美元符号<code>$</code>来获取子集，列表提供了一个最佳的展示场景，以下这行代码可以起到于上一行代码一样的效果，而不用加各种括号和引号：</p>
<div class="sourceCode" id="cb190"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb190-1"><a href="task-01.html#cb190-1" aria-hidden="true" tabindex="-1"></a>stuff<span class="sc">$</span>I<span class="sc">$</span>II</span></code></pre></div>
<pre><code>## [1] &quot;To be or not to be&quot;</code></pre>
</div>
</div>
<div id="数据表data-frame-与-tibble" class="section level3" number="1.4.3">
<h3><span class="header-section-number">1.4.3</span> 数据表（data frame 与 tibble）</h3>
<p>数据表将是进行数据分析的时候接触的最多的数据类型了。一个数据表（data frame）的本质是一个列表（list），但是采取了矩阵（matrix）的展示形式：</p>
<div class="sourceCode" id="cb192"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb192-1"><a href="task-01.html#cb192-1" aria-hidden="true" tabindex="-1"></a>df <span class="ot">&lt;-</span> <span class="fu">data.frame</span>(<span class="at">x =</span> <span class="dv">1</span><span class="sc">:</span><span class="dv">12</span>, <span class="at">y =</span> month.abb, <span class="at">z =</span> month.name)</span>
<span id="cb192-2"><a href="task-01.html#cb192-2" aria-hidden="true" tabindex="-1"></a>df</span></code></pre></div>
<pre><code>##     x   y         z
## 1   1 Jan   January
## 2   2 Feb  February
## 3   3 Mar     March
## 4   4 Apr     April
## 5   5 May       May
## 6   6 Jun      June
## 7   7 Jul      July
## 8   8 Aug    August
## 9   9 Sep September
## 10 10 Oct   October
## 11 11 Nov  November
## 12 12 Dec  December</code></pre>
<p>数据表的每一列是一个子列表。将几个长度相同的子列表并排放在一起，就组成了一个长方形的矩阵形式。这种特殊的处理使得数据表包含了两种数据形式的优势。列与列之间可以使用不同的基础数据类型，也就是说一列的数据是数值型的数据，下一列数据可以是字符型的数据。长方形的形状保证了列与列之间的数值是一一对应的，每一行都是一个观察量。这很符合日常会遇到的数据的形式。</p>
<p><code>tibble</code>是<code>tidyverse</code>系列包中的<code>tibble</code>包提供的一种数据形式。使用<code>tibble</code>比较明显的好处是，当你把<code>tibble</code>打印在控制台里的时候，它有一个更干净直观的打印方式。与 data frame 试图打印所有的行、一股脑把所有信息扔给你不同，<code>tibble</code> 默认只会打印前几行给你一个数据长什么样的感觉，还会告诉你每一列的数据是什么类型的：</p>
<div class="sourceCode" id="cb194"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb194-1"><a href="task-01.html#cb194-1" aria-hidden="true" tabindex="-1"></a>tb <span class="ot">&lt;-</span> <span class="fu">tibble</span>(<span class="at">a =</span> <span class="dv">1</span><span class="sc">:</span><span class="dv">100</span>, <span class="at">b =</span> <span class="dv">101</span><span class="sc">:</span><span class="dv">200</span>)</span>
<span id="cb194-2"><a href="task-01.html#cb194-2" aria-hidden="true" tabindex="-1"></a>tb</span></code></pre></div>
<pre><code>## # A tibble: 100 x 2
##        a     b
##    &lt;int&gt; &lt;int&gt;
##  1     1   101
##  2     2   102
##  3     3   103
##  4     4   104
##  5     5   105
##  6     6   106
##  7     7   107
##  8     8   108
##  9     9   109
## 10    10   110
## # ... with 90 more rows</code></pre>
<p>除了看起来好看以外，<code>tibble</code>在原始数据表的基础上保留了有用的功能，去除了多余的功能。它干得更少，比如它不会自发修改变量类型或变量名字，也不会做部分匹配；同时它抱怨得更多，比如当一个变量不存在的时候就会触发错误信息。这样用户就能及早发现错误，不会等到代码堆成小💩山。</p>
<div id="访问数据表内容" class="section level4" number="1.4.3.1">
<h4><span class="header-section-number">1.4.3.1</span> 访问数据表内容</h4>
<p>既然看上去像矩阵，听起来像列表，那就应该可以用适用于矩阵和列表的方法访问数据表元素。事实上也的确是这样：</p>
<div class="sourceCode" id="cb196"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb196-1"><a href="task-01.html#cb196-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 访问数据表名为x的列</span></span>
<span id="cb196-2"><a href="task-01.html#cb196-2" aria-hidden="true" tabindex="-1"></a>df[[<span class="st">&quot;x&quot;</span>]]</span></code></pre></div>
<pre><code>##  [1]  1  2  3  4  5  6  7  8  9 10 11 12</code></pre>
<div class="sourceCode" id="cb198"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb198-1"><a href="task-01.html#cb198-1" aria-hidden="true" tabindex="-1"></a>df<span class="sc">$</span>x</span></code></pre></div>
<pre><code>##  [1]  1  2  3  4  5  6  7  8  9 10 11 12</code></pre>
<div class="sourceCode" id="cb200"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb200-1"><a href="task-01.html#cb200-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 访问第一行第二个数值</span></span>
<span id="cb200-2"><a href="task-01.html#cb200-2" aria-hidden="true" tabindex="-1"></a>df[<span class="dv">1</span>, <span class="dv">2</span>]</span></code></pre></div>
<pre><code>## [1] &quot;Jan&quot;</code></pre>
<div class="sourceCode" id="cb202"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb202-1"><a href="task-01.html#cb202-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 访问tibble第2列</span></span>
<span id="cb202-2"><a href="task-01.html#cb202-2" aria-hidden="true" tabindex="-1"></a>tb[, <span class="dv">2</span>]</span></code></pre></div>
<pre><code>## # A tibble: 100 x 1
##        b
##    &lt;int&gt;
##  1   101
##  2   102
##  3   103
##  4   104
##  5   105
##  6   106
##  7   107
##  8   108
##  9   109
## 10   110
## # ... with 90 more rows</code></pre>
<div class="sourceCode" id="cb204"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb204-1"><a href="task-01.html#cb204-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 访问tibble第1行第2列的数值</span></span>
<span id="cb204-2"><a href="task-01.html#cb204-2" aria-hidden="true" tabindex="-1"></a>tb[<span class="dv">1</span>, <span class="dv">2</span>]</span></code></pre></div>
<pre><code>## # A tibble: 1 x 1
##       b
##   &lt;int&gt;
## 1   101</code></pre>
<div class="sourceCode" id="cb206"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb206-1"><a href="task-01.html#cb206-1" aria-hidden="true" tabindex="-1"></a>tb<span class="sc">$</span>a</span>
<span id="cb206-2"><a href="task-01.html#cb206-2" aria-hidden="true" tabindex="-1"></a><span class="co">#太长了还是不显示了吧</span></span></code></pre></div>
<p>tibble 的另一个特性是其访问的子列表也是<code>tibble</code>类型的数据表，即使是用单引号返回一个格子的元素。</p>
<p>关于<code>tibble</code>更多信息，详见<a href="https://tibble.tidyverse.org/"><code>tibble</code>主页</a>。</p>
</div>
</div>
</div>
<div id="读写数据" class="section level2" number="1.5">
<h2><span class="header-section-number">1.5</span> 读写数据</h2>
<p>这一章我们主要讨论根据不同数据保存方式区分的读写数据的方法。</p>
<div id="内置数据集" class="section level3" number="1.5.1">
<h3><span class="header-section-number">1.5.1</span> 内置数据集</h3>
<p>R本身和一些R包都会有内置的数据集。使用<code>data</code>命令来查看、使用可用数据集。</p>
<div class="sourceCode" id="cb207"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb207-1"><a href="task-01.html#cb207-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 查看R本身自带的数据集</span></span>
<span id="cb207-2"><a href="task-01.html#cb207-2" aria-hidden="true" tabindex="-1"></a><span class="fu">data</span>()</span>
<span id="cb207-3"><a href="task-01.html#cb207-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb207-4"><a href="task-01.html#cb207-4" aria-hidden="true" tabindex="-1"></a><span class="co"># 查看某一R包自带的数据集</span></span>
<span id="cb207-5"><a href="task-01.html#cb207-5" aria-hidden="true" tabindex="-1"></a><span class="fu">data</span>(<span class="at">package =</span> <span class="st">&quot;dplyr&quot;</span>)</span></code></pre></div>
<div class="sourceCode" id="cb208"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb208-1"><a href="task-01.html#cb208-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 载入AirPassengers数据集</span></span>
<span id="cb208-2"><a href="task-01.html#cb208-2" aria-hidden="true" tabindex="-1"></a><span class="fu">data</span>(<span class="st">&quot;AirPassengers&quot;</span>)</span>
<span id="cb208-3"><a href="task-01.html#cb208-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb208-4"><a href="task-01.html#cb208-4" aria-hidden="true" tabindex="-1"></a><span class="fu">glimpse</span>(AirPassengers)</span></code></pre></div>
<pre><code>##  Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...</code></pre>
</div>
<div id="表格类型数据csv-excel" class="section level3" number="1.5.2">
<h3><span class="header-section-number">1.5.2</span> 表格类型数据（csv, excel)</h3>
<p>h1n1 流感问卷数据储存在名为 “h1n1_flu.csv” 的文件中，我们会在下一篇《数据清洗与准备》中用到。假设 “h1n1_flu” 有不同的储存类型，我们列举一些读写数据表类型数据的方法。</p>
<div class="sourceCode" id="cb210"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb210-1"><a href="task-01.html#cb210-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 读取csv文件</span></span>
<span id="cb210-2"><a href="task-01.html#cb210-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(readr)</span>
<span id="cb210-3"><a href="task-01.html#cb210-3" aria-hidden="true" tabindex="-1"></a>h1n1_flu <span class="ot">&lt;-</span> <span class="fu">read_csv</span>(<span class="st">&quot;h1n1_flu.csv&quot;</span>)</span>
<span id="cb210-4"><a href="task-01.html#cb210-4" aria-hidden="true" tabindex="-1"></a><span class="co"># 保存csv文件</span></span>
<span id="cb210-5"><a href="task-01.html#cb210-5" aria-hidden="true" tabindex="-1"></a><span class="fu">write_csv</span>(h1n1_flu, <span class="st">&quot;h1n1_flu.csv&quot;</span>)</span>
<span id="cb210-6"><a href="task-01.html#cb210-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb210-7"><a href="task-01.html#cb210-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb210-8"><a href="task-01.html#cb210-8" aria-hidden="true" tabindex="-1"></a><span class="co"># 读取excel文件</span></span>
<span id="cb210-9"><a href="task-01.html#cb210-9" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(readxl)</span>
<span id="cb210-10"><a href="task-01.html#cb210-10" aria-hidden="true" tabindex="-1"></a><span class="co"># 自动识别文件后缀</span></span>
<span id="cb210-11"><a href="task-01.html#cb210-11" aria-hidden="true" tabindex="-1"></a>h1n1_flu <span class="ot">&lt;-</span> <span class="fu">read_excel</span>(<span class="st">&quot;h1n1_flu.xls&quot;</span>)</span>
<span id="cb210-12"><a href="task-01.html#cb210-12" aria-hidden="true" tabindex="-1"></a><span class="co"># 读取xls文件</span></span>
<span id="cb210-13"><a href="task-01.html#cb210-13" aria-hidden="true" tabindex="-1"></a>h1n1_flu <span class="ot">&lt;-</span> <span class="fu">read_xls</span>(<span class="st">&quot;h1n1_flu.xls&quot;</span>)</span>
<span id="cb210-14"><a href="task-01.html#cb210-14" aria-hidden="true" tabindex="-1"></a><span class="co"># 读取xlsx文件</span></span>
<span id="cb210-15"><a href="task-01.html#cb210-15" aria-hidden="true" tabindex="-1"></a>h1n1_flu <span class="ot">&lt;-</span> <span class="fu">read_xlsx</span>(<span class="st">&quot;h1n1_flu.xlsx&quot;</span>)</span></code></pre></div>
<p>不建议在R中直接编辑 excel 文件，csv 文件应该满足日常所需了。如果有编辑 excel 文件的需求，可以看看<code>openxlsx</code>包。</p>
</div>
<div id="r的专属类型数据rdata-rds" class="section level3" number="1.5.3">
<h3><span class="header-section-number">1.5.3</span> R的专属类型数据（RData, rds）</h3>
<p>有一些数据存储方式是R中独有的。我们在这里讨论两类。一类是 rds 文件，一类是 RData 文件。</p>
<ol style="list-style-type: decimal">
<li>rds 文件储存一个R中的对象。这个对象不一定是四四方方的数据表，而可以是任何形式，包括复杂的列表等。因为他储存的是一个对象，所以读取的时候也是读取出来一个对象，需要被保存在一个名字下。</li>
<li>RData 储存的是一个或多个、任意结构的、带有自己名字的对象。读取的时候会将储存的对象直接载入当前的环境中，使用的是对象自己的名字，所以不需要再额外起名字。</li>
</ol>
<div class="sourceCode" id="cb211"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb211-1"><a href="task-01.html#cb211-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 读取</span></span>
<span id="cb211-2"><a href="task-01.html#cb211-2" aria-hidden="true" tabindex="-1"></a>h1n1_flu <span class="ot">&lt;-</span> <span class="fu">read_rds</span>(<span class="st">&quot;h1n1_flu.rds&quot;</span>)</span>
<span id="cb211-3"><a href="task-01.html#cb211-3" aria-hidden="true" tabindex="-1"></a><span class="co"># 存储</span></span>
<span id="cb211-4"><a href="task-01.html#cb211-4" aria-hidden="true" tabindex="-1"></a><span class="fu">write_rds</span>(h1n1_flu, <span class="st">&quot;h1n1_flu.rds&quot;</span>)</span>
<span id="cb211-5"><a href="task-01.html#cb211-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-6"><a href="task-01.html#cb211-6" aria-hidden="true" tabindex="-1"></a><span class="co"># 读取</span></span>
<span id="cb211-7"><a href="task-01.html#cb211-7" aria-hidden="true" tabindex="-1"></a><span class="fu">load</span>(<span class="st">&quot;h1n1_flu.RData&quot;</span>)</span>
<span id="cb211-8"><a href="task-01.html#cb211-8" aria-hidden="true" tabindex="-1"></a><span class="co"># 存储</span></span>
<span id="cb211-9"><a href="task-01.html#cb211-9" aria-hidden="true" tabindex="-1"></a><span class="fu">save</span>(h1n1_flu, <span class="at">file =</span> <span class="st">&quot;h1n1_flu.RData&quot;</span>)</span></code></pre></div>
</div>
<div id="其他软件spss-stata-sas" class="section level3" number="1.5.4">
<h3><span class="header-section-number">1.5.4</span> 其他软件（SPSS, Stata, SAS）</h3>
<p>R也可以直接读取其他软件的数据类型。这里列举使用<code>haven</code>包读写 SPSS 的 sav 和 zsav、 Stata 的 dta、SAS 的 sas7bdat 和 sas7bcat。</p>
<div class="sourceCode" id="cb212"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb212-1"><a href="task-01.html#cb212-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(haven)</span>
<span id="cb212-2"><a href="task-01.html#cb212-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb212-3"><a href="task-01.html#cb212-3" aria-hidden="true" tabindex="-1"></a><span class="co"># SPSS</span></span>
<span id="cb212-4"><a href="task-01.html#cb212-4" aria-hidden="true" tabindex="-1"></a><span class="fu">read_spss</span>()</span>
<span id="cb212-5"><a href="task-01.html#cb212-5" aria-hidden="true" tabindex="-1"></a><span class="fu">write_spss</span>()</span>
<span id="cb212-6"><a href="task-01.html#cb212-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb212-7"><a href="task-01.html#cb212-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Stata</span></span>
<span id="cb212-8"><a href="task-01.html#cb212-8" aria-hidden="true" tabindex="-1"></a><span class="fu">read_dta</span>() </span>
<span id="cb212-9"><a href="task-01.html#cb212-9" aria-hidden="true" tabindex="-1"></a><span class="fu">write_dta</span>() </span>
<span id="cb212-10"><a href="task-01.html#cb212-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb212-11"><a href="task-01.html#cb212-11" aria-hidden="true" tabindex="-1"></a><span class="co"># SAS</span></span>
<span id="cb212-12"><a href="task-01.html#cb212-12" aria-hidden="true" tabindex="-1"></a><span class="fu">read_sas</span>()</span>
<span id="cb212-13"><a href="task-01.html#cb212-13" aria-hidden="true" tabindex="-1"></a><span class="fu">write_sas</span>()</span></code></pre></div>
</div>
</div>
<div id="练习题" class="section level2" number="1.6">
<h2><span class="header-section-number">1.6</span> 练习题</h2>
<div id="了解数据集" class="section level3" number="1.6.1">
<h3><span class="header-section-number">1.6.1</span> 了解数据集</h3>
<p>请使用之前读取的<code>h1n1_flu</code>完成以下任务。</p>
<div id="常用数据探查函数" class="section level4" number="1.6.1.1">
<h4><span class="header-section-number">1.6.1.1</span> 常用数据探查函数</h4>
<p>请尝试使用以下常用的数据探查函数，挑出两个你最喜欢的描述他们的功能。别忘了可以用<code>?fun</code>查看帮助文档。</p>
<div class="sourceCode" id="cb213"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb213-1"><a href="task-01.html#cb213-1" aria-hidden="true" tabindex="-1"></a><span class="fu">glimpse</span>(h1n1_flu)</span>
<span id="cb213-2"><a href="task-01.html#cb213-2" aria-hidden="true" tabindex="-1"></a><span class="fu">str</span>(h1n1_flu)</span>
<span id="cb213-3"><a href="task-01.html#cb213-3" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(h1n1_flu)</span>
<span id="cb213-4"><a href="task-01.html#cb213-4" aria-hidden="true" tabindex="-1"></a><span class="fu">tail</span>(h1n1_flu)</span>
<span id="cb213-5"><a href="task-01.html#cb213-5" aria-hidden="true" tabindex="-1"></a><span class="fu">View</span>(h1n1_flu)</span>
<span id="cb213-6"><a href="task-01.html#cb213-6" aria-hidden="true" tabindex="-1"></a><span class="fu">summary</span>(h1n1_flu)</span>
<span id="cb213-7"><a href="task-01.html#cb213-7" aria-hidden="true" tabindex="-1"></a><span class="fu">nrow</span>(h1n1_flu)</span>
<span id="cb213-8"><a href="task-01.html#cb213-8" aria-hidden="true" tabindex="-1"></a><span class="fu">length</span>(h1n1_flu<span class="sc">$</span>sex)</span>
<span id="cb213-9"><a href="task-01.html#cb213-9" aria-hidden="true" tabindex="-1"></a><span class="fu">class</span>(h1n1_flu<span class="sc">$</span>sex)</span>
<span id="cb213-10"><a href="task-01.html#cb213-10" aria-hidden="true" tabindex="-1"></a><span class="fu">summary</span>(h1n1_flu)</span>
<span id="cb213-11"><a href="task-01.html#cb213-11" aria-hidden="true" tabindex="-1"></a><span class="fu">table</span>(h1n1_flu<span class="sc">$</span>sex)</span></code></pre></div>
</div>
<div id="分组计算统计量" class="section level4" number="1.6.1.2">
<h4><span class="header-section-number">1.6.1.2</span> 分组计算统计量</h4>
<div class="sourceCode" id="cb214"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb214-1"><a href="task-01.html#cb214-1" aria-hidden="true" tabindex="-1"></a>h1n1_flu <span class="sc">%&gt;%</span> </span>
<span id="cb214-2"><a href="task-01.html#cb214-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">group_by</span>(sex, employment_status) <span class="sc">%&gt;%</span> </span>
<span id="cb214-3"><a href="task-01.html#cb214-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">summarise</span>(<span class="fu">n</span>())</span></code></pre></div>
<pre><code>## # A tibble: 8 x 3
## # Groups:   sex [2]
##   sex    employment_status  `n()`
##   &lt;chr&gt;  &lt;chr&gt;              &lt;int&gt;
## 1 Female Employed            7416
## 2 Female Not in Labor Force  6918
## 3 Female Unemployed           735
## 4 Female &lt;NA&gt;                 789
## 5 Male   Employed            6144
## 6 Male   Not in Labor Force  3313
## 7 Male   Unemployed           718
## 8 Male   &lt;NA&gt;                 674</code></pre>
<p>请问上边这几行代码在计算什么？你可不可以使用同样的方法计算一些其他的统计量？别忘了看看帮助文档<code>?summarise</code>。</p>
</div>
</div>
<div id="创造数据集" class="section level3" number="1.6.2">
<h3><span class="header-section-number">1.6.2</span> 创造数据集</h3>
<p>我们说过数据表的本质是将列表排列在一起，所以数据表就会有列表的性质。而我们又知道列表可以包含任何类型的数据，无论是单个的数值或者是向量、矩阵等。</p>
<ol style="list-style-type: decimal">
<li>请你创造一个数据表，其中的某一列变量的每一个格子包含的不再是常规的单个数值或者字符串，而是一个向量或者矩阵等多维的数据类型；</li>
<li>请你描述一个可以在数据分析时运用此类特性的使用场景。</li>
</ol>
</div>
</div>
<div id="本章作者-1" class="section level2 unnumbered">
<h2>本章作者</h2>
<p><strong>Fin</strong></p>
<blockquote>
<p><a href="https://yangzhuoranyang.com" class="uri">https://yangzhuoranyang.com</a></p>
</blockquote>
</div>
<div id="关于datawhale-1" class="section level2 unnumbered">
<h2>关于Datawhale</h2>
<p>Datawhale 是一个专注于数据科学与AI领域的开源组织，汇集了众多领域院校和知名企业的优秀学习者，聚合了一群有开源精神和探索精神的团队成员。Datawhale 以“for the learner，和学习者一起成长”为愿景，鼓励真实地展现自我、开放包容、互信互助、敢于试错和勇于担当。同时 Datawhale 用开源的理念去探索开源内容、开源学习和开源方案，赋能人才培养，助力人才成长，建立起人与人，人与知识，人与企业和人与未来的联结。 本次数据挖掘路径学习，专题知识将在天池分享，详情可关注 Datawhale：</p>
<p><img src="image/logo.png" width="129" /></p>

</div>
</div>
            </section>

          </div>
        </div>
      </div>
<a href="task-00.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="task-02.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
    </div>
  </div>
<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
<script src="libs/gitbook-2.6.7/js/clipboard.min.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-clipboard.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": {
"github": true,
"facebook": false,
"twitter": false,
"linkedin": true,
"weibo": true,
"instapaper": false,
"vk": false,
"whatsapp": false,
"all": ["facebook", "twitter", "linkedin", "weibo", "instapaper", "whatsapp"]
},
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": null,
"text": null
},
"history": {
"link": null,
"text": null
},
"view": {
"link": "https://github.com/FinYang/RLearning-book/blob/main/Task01_Data_Structure.Rmd",
"text": null
},
"download": ["RLearning.pdf"],
"toc": {
"collapse": "subsection"
}
});
});
</script>

</body>

</html>
